Mining text using keyword distributions

Ronen Feldman, Ido Dagan, Haym Hirsh

Research output: Contribution to journalReview articlepeer-review

107 Scopus citations

Abstract

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. This paper describes the KDT system for Knowledge Discovery in Text, in which documents are labeled by keywords, and knowledge discovery is performed by analyzing the co-occurrence frequencies of the various keywords labeling the documents. We show how this keyword-frequency approach supports a range of KDD operations, providing a suitable foundation for knowledge discovery and exploration for collections of unstructured text.

Original languageEnglish
Pages (from-to)281-300
Number of pages20
JournalJournal of Intelligent Information Systems
Volume10
Issue number3
DOIs
StatePublished - May 1998

Bibliographical note

Funding Information:
This research was supported by NSF grant IRI-9509819 and by grant 8615-1-96 from the Israeli Ministry of Science. The authors would like to thank the reviewers for helpful comments given on drafts of this paper.

Keywords

  • Data mining
  • Distribution comparison
  • Text categorization
  • Text mining
  • Trend analysis

Fingerprint

Dive into the research topics of 'Mining text using keyword distributions'. Together they form a unique fingerprint.

Cite this