TY - GEN
T1 - Efficient Unsupervised Recursive Word Segmentation Using Minimum Description Length
AU - Argamon, S
AU - Akiva, N
AU - Amihood, A.
AU - Kapah, O
N1 - Place of conference:Geneva, Switzerland
PY - 2004
Y1 - 2004
N2 - Automatic word segmentation is a basic requirement for unsupervised learning in morphological analysis. In this paper, we formulate a novel recursive method for minimum description length (MDL) word segmentation, whose basic operation is resegmenting the corpus on a prefix (equivalently, a suffix). We derive a local expression for the change in description length under resegmentation, i.e., one which depends only on properties of the specific prefix (not on the rest of the corpus). Such a formulation permits use of a new and efficient algorithm for greedy morphological segmentation of the corpus in a recursive manner. In particular, our method does not restrict words to be segmented only once, into a stem+affix form, as do many extant techniques. Early results for English and Turkish corpora are promising.
AB - Automatic word segmentation is a basic requirement for unsupervised learning in morphological analysis. In this paper, we formulate a novel recursive method for minimum description length (MDL) word segmentation, whose basic operation is resegmenting the corpus on a prefix (equivalently, a suffix). We derive a local expression for the change in description length under resegmentation, i.e., one which depends only on properties of the specific prefix (not on the rest of the corpus). Such a formulation permits use of a new and efficient algorithm for greedy morphological segmentation of the corpus in a recursive manner. In particular, our method does not restrict words to be segmented only once, into a stem+affix form, as do many extant techniques. Early results for English and Turkish corpora are promising.
UR - https://scholar.google.co.il/scholar?q=Efficient+Unsupervised+Recursive+Word+Segmentation+Using+Minimum+Description+Length%2C+Amir+Amihood+&btnG=&hl=en&as_sdt=0%2C5
M3 - Conference contribution
BT - The 20th international conference on Computational Linguistics
PB - Association for Computational Linguistics
ER -