Abstract
This paper proposes a new paradigm and a computational framework for revealing equivalencies (analogies) between sub-structures of distinct composite systems that are initially represented by unstructured data sets. For this purpose, we introduce and investigate a variant of traditional data clustering, termed coupled clustering, which outputs a configuration of corresponding subsets of two such representative sets. We apply our method to synthetic as well as textual data. Its achievements in detecting topical correspondences between textual corpora are evaluated through comparison to performance of human experts.
Original language | American English |
---|---|
Title of host publication | 18th International Conference on Machine Learning (ICML) |
State | Published - 2001 |