Abstract
A cross-period (diachronic) thesaurus enables users to search for information using modern terminology and obtain semantically related terms from earlier historical periods. The complex task of supporting the construction of a diachronic thesaurus by a domain expert lexicographer has hardly been addressed computationally until now. In this article, we introduce a semiautomatic iterative Query Expansion (QE) scheme for supporting diachronic thesaurus construction, which identifies candidate related terms based on statistical corpus-based measures. We use ancient-modern period classification to increase the performance of the statistical cooccurrence measures and extend our methods to deal with Multi-Word Expressions (MWEs). We demonstrate the empirical benefit of our scheme for a Jewish cross-period thesaurus and evaluate its impact on recall and on the effectiveness of the lexicographer's manual efforts.
Original language | English |
---|---|
Article number | 22 |
Journal | Journal on Computing and Cultural Heritage |
Volume | 9 |
Issue number | 4 |
DOIs | |
State | Published - Dec 2016 |
Bibliographical note
Publisher Copyright:© 2016 ACM 1556-4673/2016/12-ART22 $15.00.
Keywords
- Cultural heritage
- Diachronic thesaurus
- Hebrew
- Semantic similarity