Abstract
This paper proposes a new paradigm and a computational framework for revealing equivalencies (analogies) between sub-structures of distinct composite systems that are initially represented by unstructured data sets. For this purpose, we introduce and investigate a variant of traditional data clustering, termed coupled clustering, which outputs a configuration of corresponding subsets of two such representative sets. We apply our method to synthetic as well as textual data. Its achievements in detecting topical correspondences between textual corpora are evaluated through comparison to performance of human.
Original language | English |
---|---|
Pages (from-to) | 747-780 |
Number of pages | 34 |
Journal | Journal of Machine Learning Research |
Volume | 3 |
Issue number | 4-5 |
State | Published - 15 May 2003 |
Bibliographical note
Funding Information:We acknowledge the support of the Hungarian state and the European Union TAMOP-4.2.2A-11/1/KONV-2012-0072 and TAMOP-4.1.1C-12/1/KONV-2012-0017. This paper was also supported by the J?nos Bolyai Research Scholarship of the Hungarian Academy of Sciences.
Keywords
- Clustering
- Data mining in texts
- Natural language processing
- Structure mapping
- Unsupervised learning