Abstract
Pairwise clustering methods partition a dataset using pairwise similarity between data-points. The pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual information between consecutively visited clusters of states of the Markov chain defined by the similarity matrix. This cost function can be viewed as an extension of the information-bottleneck principle to the case of pairwise clustering. We show that the complexity of a sequential clustering implementation of the suggested cost function is linear in the dataset size on sparse graphs. The improved performance and the reduced computational complexity of the proposed algorithm are demonstrated on several standard datasets and on image segmentation task.
Original language | English |
---|---|
Pages (from-to) | 284-293 |
Number of pages | 10 |
Journal | Neurocomputing |
Volume | 182 |
DOIs | |
State | Published - 19 Mar 2016 |
Bibliographical note
Publisher Copyright:© 2015 Elsevier B.V.
Keywords
- Graph clustering
- Mutual information
- Normalized-cut
- Pairwise clustering
- Spectral clustering