An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language

Chaya Liebeskind, Ido Dagan, Jonathan Schler

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Corpus-based automatic thesaurus construction uses linguistic methods, such as Part-of-Speech taggers and parsers, which often perform poorly on MRLs. Therefore, in this paper, we focused on the complex task of adapting corpus-based thesaurus construction methods for MRLs. We investigated two statistical approaches for thesaurus construction; a) a first-order co-occurrence-based approach and b) a second-order distributional-based approach. We explored alternative levels of morphological term representations complemented by grouping the morphological variants. We then introduced and adopted a generic algorithmic scheme for thesaurus construction in MRLs for both first-order and second-order approaches. Our scheme investigated alternative representation levels and offered alternative configurations. We demonstrated the empirical benefits of our methodology for a diachronic Hebrew thesaurus construction. We used morphological analysis tools, defined and applied a new annotation scheme, and demonstrated its optimal configuration, which outperforms the baseline for both first and second order corpus-based thesaurus construction approaches.

Original languageEnglish
Pages (from-to)483-496
Number of pages14
JournalApplied Artificial Intelligence
Volume33
Issue number6
DOIs
StatePublished - 12 May 2019

Bibliographical note

Publisher Copyright:
© 2019, © 2019 Taylor & Francis Group, LLC.

Fingerprint

Dive into the research topics of 'An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language'. Together they form a unique fingerprint.

Cite this