Document summarization by sentence extraction using a genetic algorithm

Yaakov Hacohen-Kerner, Eylon Malin, Itschack Chasson

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text Summarization is a research domain that attracts many research groups around the scientific world. It is the process of automatically creating a condensed version of a given text that provides useful information for the user. Semitic language processing in general is of great interest today. However, the Hebrew language has been relatively little studied. In this research, the application domain is articles referring to Jewish law written in Hebrew. Summarization of these documents is done by extraction of the most relevant sentences. We have developed seven general baseline extraction methods and two specific Hebrew methods. Using a genetic algorithm, these methods are combined into a hybrid method. The success rate of the GA is reasonable compared to the rate achieved by other summarization systems (although they involve different languages, domains and features). This model in general and the baseline methods in particular can be extended with a reasonable effort for documents related to similar domains written in other languages. Investigating other extraction methods and other ML methods might lead to improved.

Original languageEnglish
Title of host publication20th International Conference on Computer Applications in Industry and Engineering 2007, CAINE 2007
Pages46-52
Number of pages7
StatePublished - 2007
Externally publishedYes
Event20th International Conference on Computer Applications in Industry and Engineering 2007, CAINE 2007 - San Francisco, CA, United States
Duration: 7 Nov 20079 Nov 2007

Publication series

Name20th International Conference on Computer Applications in Industry and Engineering 2007, CAINE 2007

Conference

Conference20th International Conference on Computer Applications in Industry and Engineering 2007, CAINE 2007
Country/TerritoryUnited States
CitySan Francisco, CA
Period7/11/079/11/07

Keywords

  • Genetic algorithm
  • Hebrew
  • Jewish law articles
  • Sentence extraction
  • Text summarization

Fingerprint

Dive into the research topics of 'Document summarization by sentence extraction using a genetic algorithm'. Together they form a unique fingerprint.

Cite this