A cross-period (diachronic) thesaurus enables users to search for information using modern terminology and obtain semantically related terms from earlier historical periods. The complex task of supporting the construction of a diachronic thesaurus by a domain expert lexicographer has hardly been addressed computationally until now. In this article, we introduce a semiautomatic iterative Query Expansion (QE) scheme for supporting diachronic thesaurus construction, which identifies candidate related terms based on statistical corpus-based measures. We use ancient-modern period classification to increase the performance of the statistical cooccurrence measures and extend our methods to deal with Multi-Word Expressions (MWEs). We demonstrate the empirical benefit of our scheme for a Jewish cross-period thesaurus and evaluate its impact on recall and on the effectiveness of the lexicographer's manual efforts.
Bibliographical notePublisher Copyright:
© 2016 ACM 1556-4673/2016/12-ART22 $15.00.
- Cultural heritage
- Diachronic thesaurus
- Semantic similarity