Tutorial 4. Mining unstructured data

Ronen Feldman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The information age has made it easy to store large amounts of data. The proliferation of documents available on the Web, on corporate intranets, on news wires, and elsewhere is overwhelming. However, while the amount of data available to us is constantly increasing, our ability to absorb and process this information remains constant. Search engines only exacerbate the problem by making more and more documents available in a matter of a few key strokes. Text Mining is a new and exciting research area that tries to solve the information overload problem by using techniques from data mining, machine learning, NLP, IR and knowledge management. Text Mining involves the preprocessing of document collections (text categorization, term extraction), the storage of the intermediate representations, the techniques to analyze these intermediate representations (distribution analysis, clustering, trend analysis, association rules etc) and visualization of the results. In this tutorial we will present the general theory of Text Mining and will demonstrate several systems that use these principles to enable interactive exploration of large textual collections. We will present a general architecture for text mining and will outline the algorithms and data structures behind the systems. Special emphasis will be given to efficient algorithms for very large document collections, tools for visualizing such document collections, the use of intelligent agents to perform text mining on the internet, and the use of information extraction to better capture the major themes of the documents. The tutorial will cover the state of the art in this rapidly growing area of research. Several real world applications of text mining will be presented.

Original languageEnglish
Title of host publicationTutorial Notes of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999
EditorsJiawei Han
PublisherAssociation for Computing Machinery
Pages182-236
Number of pages55
ISBN (Electronic)1581131712, 9781581131710
DOIs
StatePublished - 1 Aug 1999
Event5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999 - San Diego, United States
Duration: 15 Aug 199918 Aug 1999

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
VolumePart F129196

Conference

Conference5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999
Country/TerritoryUnited States
CitySan Diego
Period15/08/9918/08/99

Bibliographical note

Publisher Copyright:
© ACM 1999.

Fingerprint

Dive into the research topics of 'Tutorial 4. Mining unstructured data'. Together they form a unique fingerprint.

Cite this