Tagging a Hebrew corpus: The case of participles

Meni Adler, Yael Netzer, Yoav Goldberg, David Gabay, Michael Elhadad

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.

Fingerprint

Dive into the research topics of 'Tagging a Hebrew corpus: The case of participles'. Together they form a unique fingerprint.

Cite this