Crosscomponent clustering for template induction

Zvika Marx, I. Dagan, Eli Shamir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We suggest an unsupervised approach to template induction for information extraction, through detecting sub-topics and themes that cut across the documents of a topical corpus. We introduce a new method ñ cross component clustering ñ that simultaneously clusters the components forming our setting, each of which consists of the words of a single article. Our algorithm is derived from the Information Bottleneck clustering algorithm. The resulting clusters are found to be in systematic correspondence with sets of terms that are used in filling the slots of the MUC3/4 ready-made template, which was used for evaluation.
Original languageAmerican English
Title of host publicationWorkshop on Text Learning (TextML-2002)
StatePublished - 2002

Bibliographical note

Place of conference:Sydney, Australia

Fingerprint

Dive into the research topics of 'Crosscomponent clustering for template induction'. Together they form a unique fingerprint.

Cite this