TY - JOUR
T1 - Automatic thesaurus construction for cross generation corpus
AU - Zohar, Hadas
AU - Liebeskind, Chaya
AU - Schler, Jonathan
AU - Dagan, Ido
PY - 2013/3
Y1 - 2013/3
N2 - This article describes methods for semiautomatic thesaurus construction, for a cross generation, cross genre, and cross cultural corpus. Semiautomatic thesaurus construction is a complex task, and applying it on a cross generation corpus brings its own challenges. We used a Jewish juristic corpus containing documents and genres that were written across 2000 years, and contain a mix of different languages, dialects, geographies, and writing styles. We evaluated different first and second order methods, and introduced a special annotation scheme for this problem, which showed that first order methods performed surprisingly well. We found that in our case, improving the coverage is the more difficult task, for this we introduce a new algorithm to increase recall (coverage)-which is applicable to many other problems as well, and demonstrates significant improvement in our corpus.
AB - This article describes methods for semiautomatic thesaurus construction, for a cross generation, cross genre, and cross cultural corpus. Semiautomatic thesaurus construction is a complex task, and applying it on a cross generation corpus brings its own challenges. We used a Jewish juristic corpus containing documents and genres that were written across 2000 years, and contain a mix of different languages, dialects, geographies, and writing styles. We evaluated different first and second order methods, and introduced a special annotation scheme for this problem, which showed that first order methods performed surprisingly well. We found that in our case, improving the coverage is the more difficult task, for this we introduce a new algorithm to increase recall (coverage)-which is applicable to many other problems as well, and demonstrates significant improvement in our corpus.
KW - Cultural heritage
KW - Hebrew
KW - Language model
UR - http://www.scopus.com/inward/record.url?scp=84979807586&partnerID=8YFLogxK
U2 - 10.1145/2442080.2442084
DO - 10.1145/2442080.2442084
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84979807586
SN - 1556-4673
VL - 6
JO - Journal on Computing and Cultural Heritage
JF - Journal on Computing and Cultural Heritage
IS - 1
M1 - 4
ER -