Initial experiments with extraction of stopwords in Hebrew

Yaakov HaCohen-Kerner, Shmuel Yishai Blitz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Stopwords are regarded as meaningless in terms of information retrieval. Various stopword lists have been constructed for English and a few other languages. However, to the best of our knowledge, no stopword list has been constructed for Hebrew. In this ongoing work, we present an implementation of three baseline methods that attempt to extract stopwords for a data set containing Israeli daily news. Two of the methods are state-of-the-art methods previously applied to other languages and the third method is proposed by the authors. Comparison of the behavior of these three methods to the behavior of the Zipf's law shows that Zipf's succeeds to describe the distribution of the top occurring words according to these methods.

Original languageEnglish
Title of host publicationKDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
Pages449-453
Number of pages5
StatePublished - 2010
Externally publishedYes
EventInternational Conference on Knowledge Discovery and Information Retrieval, KDIR 2010 - Valencia, Spain
Duration: 25 Oct 201028 Oct 2010

Publication series

NameKDIR 2010 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval

Conference

ConferenceInternational Conference on Knowledge Discovery and Information Retrieval, KDIR 2010
Country/TerritorySpain
CityValencia
Period25/10/1028/10/10

Keywords

  • Hebrew
  • Information retrieval
  • Stopwords
  • Zipf's law

Fingerprint

Dive into the research topics of 'Initial experiments with extraction of stopwords in Hebrew'. Together they form a unique fingerprint.

Cite this