Abstract
We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.
Original language | English |
---|---|
Title of host publication | Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 |
Publisher | European Language Resources Association (ELRA) |
Pages | 3167-3174 |
Number of pages | 8 |
ISBN (Electronic) | 2951740840, 9782951740846 |
State | Published - 2008 |
Externally published | Yes |
Event | 6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco Duration: 28 May 2008 → 30 May 2008 |
Publication series
Name | Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 |
---|
Conference
Conference | 6th International Conference on Language Resources and Evaluation, LREC 2008 |
---|---|
Country/Territory | Morocco |
City | Marrakech |
Period | 28/05/08 → 30/05/08 |
Bibliographical note
Funding Information:This work is supported in part by the Lynn and William Frankel Center for Computer Sciences.
Funding
This work is supported in part by the Lynn and William Frankel Center for Computer Sciences.
Funders | Funder number |
---|---|
Lynn and William Frankel Center for Computer Sciences |