Parallel pairwise clustering

Elad Yom-Tov, Noam Slonim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Given the pairwise affinity relations associated with a set of data items, the goal of a clustering algorithm is to automatically partition the data into a small number of homogeneous clusters. However, since the input size is quadratic in the number of data points, existing algorithms are non feasible for many practical applications. Here, we propose a simple strategy to cluster massive data by randomly splitting the original affinity matrix into small manageable affinity matrices that are clustered independently. Our proposal is most appealing in a parallel computing environment where at each iteration, each worker node clusters a subset of the input data and the results from all workers are then integrated in a master node to create a new clustering partition over the entire data. We demonstrate that this approach yields high quality clustering partitions for various real world problems, even though at each iteration only small fractions of the original data matrix are examined and at no point is the entire affinity matrix stored in memory or even computed. Furthermore, we demonstrate that the proposed algorithm has intriguing stochastic convergence properties that provide further insight into the clustering problem.

Original languageEnglish
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages741-751
Number of pages11
StatePublished - 2009
Externally publishedYes
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: 30 Apr 20092 May 2009

Publication series

NameSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume2

Conference

Conference9th SIAM International Conference on Data Mining 2009, SDM 2009
Country/TerritoryUnited States
CitySparks, NV
Period30/04/092/05/09

Fingerprint

Dive into the research topics of 'Parallel pairwise clustering'. Together they form a unique fingerprint.

Cite this