Three-dimensional parametrization for parsing morphologically rich languages

Reut Tsarfaty, Khalil Sima’an

Research output: Contribution to conferencePaperpeer-review

11 Scopus citations

Abstract

Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned on both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrase-structures are often shallow, there are additional morphological factors that govern the generation process. Here we propose that agreement features percolated up the parse-tree form a third dimension of parametrization that is orthogonal to the previous two. This dimension differs from mere “state-splits” as it applies to a whole set of categories rather than to individual ones and encodes linguistically motivated co-occurrences between them. This paper presents extensive experiments with extensions of unlexicalized PCFGs for parsing Modern Hebrew in which tuning the parameters in three dimensions gradually leads to improved performance. Our best result introduces a new, stronger, lower bound on the performance of treebank grammars for parsing Modern Hebrew, and is on a par with current results for parsing Modern Standard Arabic obtained by a fully lexicalized parser trained on a much larger treebank.

Original languageEnglish
Pages156-167
Number of pages12
StatePublished - 2007
Externally publishedYes
Event10th International Conference on Parsing Technologies, IWPT 2007 - Prague, Czech Republic
Duration: 23 Jun 200724 Jun 2007

Conference

Conference10th International Conference on Parsing Technologies, IWPT 2007
Country/TerritoryCzech Republic
CityPrague
Period23/06/0724/06/07

Bibliographical note

Publisher Copyright:
© 2007 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Three-dimensional parametrization for parsing morphologically rich languages'. Together they form a unique fingerprint.

Cite this