TY - GEN
T1 - Knowledge Discovery in Textual Databases (KDT)
AU - Feldman, Ronen
AU - Dagan, I.
N1 - Place of conference:Heraclion, Crete, Greece
PY - 1995
Y1 - 1995
N2 - The information age is characterized by a rapid growth in the
amount of information available in electronic media. Traditional
data handling methods are not adequate to cope with this
information flood. Knowledge Discovery in Databases (KDD) is
a new paradigm that focuses on computerized exploration of
large amounts of data and on discovery of relevant and
interesting patterns within them. While most work on KDD is
concerned with structured databases, it is clear that this
paradigm is required for handling the huge amount of
information that is available only in unstructured textual form.
To apply traditional KDD on texts it is necessary to impose
some structure on the data that would be rich enough to allow
for interesting KDD operations. On the other hand, we have to
consider the severe limitations of current text processing
technology and define rather simple structures that can be
extracted from texts fairly automatically and in a reasonable
cost. We propose using a text categorization paradigm to
annotate text articles with meaningful concepts that are
organized in hierarchical structure. We suggest that this
relatively simple annotation is rich enough to provide the basis
for a KDD framework, enabling data summarization,
exploration of interesting patterns, and trend analysis. This
research combines the KDD and text categorization paradigms
and suggests advances to the state of the art in both areas.
AB - The information age is characterized by a rapid growth in the
amount of information available in electronic media. Traditional
data handling methods are not adequate to cope with this
information flood. Knowledge Discovery in Databases (KDD) is
a new paradigm that focuses on computerized exploration of
large amounts of data and on discovery of relevant and
interesting patterns within them. While most work on KDD is
concerned with structured databases, it is clear that this
paradigm is required for handling the huge amount of
information that is available only in unstructured textual form.
To apply traditional KDD on texts it is necessary to impose
some structure on the data that would be rich enough to allow
for interesting KDD operations. On the other hand, we have to
consider the severe limitations of current text processing
technology and define rather simple structures that can be
extracted from texts fairly automatically and in a reasonable
cost. We propose using a text categorization paradigm to
annotate text articles with meaningful concepts that are
organized in hierarchical structure. We suggest that this
relatively simple annotation is rich enough to provide the basis
for a KDD framework, enabling data summarization,
exploration of interesting patterns, and trend analysis. This
research combines the KDD and text categorization paradigms
and suggests advances to the state of the art in both areas.
UR - https://scholar.google.co.il/scholar?q=Knowledge+Discovery+in+Textual+Databases&btnG=&hl=en&as_sdt=0%2C5
M3 - Conference contribution
BT - ECML Workshop in Knowledge Discovery
ER -