TY - GEN
T1 - Initial experiments with extraction of stopwords in Hebrew
AU - HaCohen-Kerner, Yaakov
AU - Blitz, Shmuel Yishai
PY - 2010
Y1 - 2010
N2 - Stopwords are regarded as meaningless in terms of information retrieval. Various stopword lists have been constructed for English and a few other languages. However, to the best of our knowledge, no stopword list has been constructed for Hebrew. In this ongoing work, we present an implementation of three baseline methods that attempt to extract stopwords for a data set containing Israeli daily news. Two of the methods are state-of-the-art methods previously applied to other languages and the third method is proposed by the authors. Comparison of the behavior of these three methods to the behavior of the Zipf's law shows that Zipf's succeeds to describe the distribution of the top occurring words according to these methods.
AB - Stopwords are regarded as meaningless in terms of information retrieval. Various stopword lists have been constructed for English and a few other languages. However, to the best of our knowledge, no stopword list has been constructed for Hebrew. In this ongoing work, we present an implementation of three baseline methods that attempt to extract stopwords for a data set containing Israeli daily news. Two of the methods are state-of-the-art methods previously applied to other languages and the third method is proposed by the authors. Comparison of the behavior of these three methods to the behavior of the Zipf's law shows that Zipf's succeeds to describe the distribution of the top occurring words according to these methods.
KW - Hebrew
KW - Information retrieval
KW - Stopwords
KW - Zipf's law
UR - http://www.scopus.com/inward/record.url?scp=78651432882&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:78651432882
SN - 9789898425287
T3 - KDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
SP - 449
EP - 453
BT - KDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
T2 - International Conference on Knowledge Discovery and Information Retrieval, KDIR 2010
Y2 - 25 October 2010 through 28 October 2010
ER -