Tagging a Hebrew corpus: The case of participles

Meni Adler, Yael Netzer, Yoav Goldberg, David Gabay, Michael Elhadad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on four aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for beinoni forms in Hebrew. We explain how this tag is defined, and how it helped us improve manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking.

Original languageEnglish
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages3167-3174
Number of pages8
ISBN (Electronic)2951740840, 9782951740846
StatePublished - 2008
Externally publishedYes
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: 28 May 200830 May 2008

Publication series

NameProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

Conference

Conference6th International Conference on Language Resources and Evaluation, LREC 2008
Country/TerritoryMorocco
CityMarrakech
Period28/05/0830/05/08

Bibliographical note

Funding Information:
This work is supported in part by the Lynn and William Frankel Center for Computer Sciences.

Fingerprint

Dive into the research topics of 'Tagging a Hebrew corpus: The case of participles'. Together they form a unique fingerprint.

Cite this