TY - JOUR
T1 - An Algorithmic Scheme for Statistical Thesaurus Construction in a Morphologically Rich Language
AU - Liebeskind, Chaya
AU - Dagan, Ido
AU - Schler, Jonathan
N1 - Publisher Copyright:
© 2019, © 2019 Taylor & Francis Group, LLC.
PY - 2019/5/12
Y1 - 2019/5/12
N2 - Corpus-based automatic thesaurus construction uses linguistic methods, such as Part-of-Speech taggers and parsers, which often perform poorly on MRLs. Therefore, in this paper, we focused on the complex task of adapting corpus-based thesaurus construction methods for MRLs. We investigated two statistical approaches for thesaurus construction; a) a first-order co-occurrence-based approach and b) a second-order distributional-based approach. We explored alternative levels of morphological term representations complemented by grouping the morphological variants. We then introduced and adopted a generic algorithmic scheme for thesaurus construction in MRLs for both first-order and second-order approaches. Our scheme investigated alternative representation levels and offered alternative configurations. We demonstrated the empirical benefits of our methodology for a diachronic Hebrew thesaurus construction. We used morphological analysis tools, defined and applied a new annotation scheme, and demonstrated its optimal configuration, which outperforms the baseline for both first and second order corpus-based thesaurus construction approaches.
AB - Corpus-based automatic thesaurus construction uses linguistic methods, such as Part-of-Speech taggers and parsers, which often perform poorly on MRLs. Therefore, in this paper, we focused on the complex task of adapting corpus-based thesaurus construction methods for MRLs. We investigated two statistical approaches for thesaurus construction; a) a first-order co-occurrence-based approach and b) a second-order distributional-based approach. We explored alternative levels of morphological term representations complemented by grouping the morphological variants. We then introduced and adopted a generic algorithmic scheme for thesaurus construction in MRLs for both first-order and second-order approaches. Our scheme investigated alternative representation levels and offered alternative configurations. We demonstrated the empirical benefits of our methodology for a diachronic Hebrew thesaurus construction. We used morphological analysis tools, defined and applied a new annotation scheme, and demonstrated its optimal configuration, which outperforms the baseline for both first and second order corpus-based thesaurus construction approaches.
UR - http://www.scopus.com/inward/record.url?scp=85062335504&partnerID=8YFLogxK
U2 - 10.1080/08839514.2019.1583447
DO - 10.1080/08839514.2019.1583447
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85062335504
SN - 0883-9514
VL - 33
SP - 483
EP - 496
JO - Applied Artificial Intelligence
JF - Applied Artificial Intelligence
IS - 6
ER -