Tagging a Hebrew corpus: The case of participles

  • Meni Adler
  • , Yael Netzer
  • , Yoav Goldberg
  • , David Gabay
  • , Michael Elhadad

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.

Funding

This work is supported in part by the Lynn and William Frankel Center for Computer Sciences.

Fingerprint

Dive into the research topics of 'Tagging a Hebrew corpus: The case of participles'. Together they form a unique fingerprint.

Cite this