Cross-dataset clustering: Revealing corresponding themes across multiple corpora

I. Dagan, Zvika Marx, Eli Shamir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


We present a method for identifying corresponding themes across several corpora that are focused on related, but distinct, domains. This task is approached through simultaneous clustering of keyword sets extracted from the analyzed corpora. Our algorithm extends the information-bottleneck soft clustering method for a suitable setting consisting of several datasets. Experimentation with topical corpora reveals similar aspects of three distinct religions. The evaluation is by way of comparison to clusters constructed manually by an expert.
Original languageAmerican English
Title of host publicationThe 6th conference on Natural language learning
PublisherAssociation for Computational Linguistics
StatePublished - 2002

Bibliographical note

Place of conference:Taipei, Taiwan


Dive into the research topics of 'Cross-dataset clustering: Revealing corresponding themes across multiple corpora'. Together they form a unique fingerprint.

Cite this