Abstract
We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.
| Original language | English |
|---|---|
| Journal | Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 |
| State | Published - 1 Jan 2008 |
Funding
This work is supported in part by the Lynn and William Frankel Center for Computer Sciences.
Fingerprint
Dive into the research topics of 'Tagging a Hebrew corpus: The case of participles'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver