TY - JOUR
T1 - Words as classifiers of documents according to their historical period and the ethnic origin of their authors
AU - HaCohen-Kerner, Yaakov
AU - Mughaz, Dror
AU - Beck, Hananya
AU - Yehudai, Elchai
PY - 2008/4
Y1 - 2008/4
N2 - Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Forman's (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate.
AB - Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Forman's (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate.
UR - http://www.scopus.com/inward/record.url?scp=42649111732&partnerID=8YFLogxK
U2 - 10.1080/01969720801944299
DO - 10.1080/01969720801944299
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:42649111732
SN - 0196-9722
VL - 39
SP - 213
EP - 228
JO - Cybernetics and Systems
JF - Cybernetics and Systems
IS - 3
ER -