Identifying the correct root of an ambiguous Hebrew word

Yaakov Hacohen-Kerner, Ofir Tzvi Erlich

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Stemming is useful for various natural language processing tasks, such as document indexing and text classification. Therefore, identification of the correct root of any given word is important. For Hebrew this is not a trivial task, due to the complex nature of Hebrew morphology and its orthography. Many Hebrew words are ambiguous in the sense that each one of them can be created from a few possible roots. However, for a given word in a specific context, each word has only one correct root or no root at all. We have developed a variety of features in order to find the correct root for a Hebrew ambiguous word. These features are classified into 3 distinct groups: root-based features, conjugation-based features and statistical features. Several common machine learning methods have been tested in order to find a successful integration of the features. The best result has been achieved by Naïve Bayes, with about 87% accuracy.

Original languageEnglish
Pages (from-to)36-53
Number of pages18
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8003
DOIs
StatePublished - 2014
Externally publishedYes

Bibliographical note

Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2014.

Keywords

  • Disambiguation
  • Hebrew-Aramaic documents
  • Machine learning methods
  • Natural language processing
  • Stemming

Fingerprint

Dive into the research topics of 'Identifying the correct root of an ambiguous Hebrew word'. Together they form a unique fingerprint.

Cite this