A local search approximation algorithm for k-means clustering

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu

Research output: Contribution to conferencePaperpeer-review

148 Scopus citations

Abstract

In k-means clustering we are given a set of n data points in d-dimensional space ℜd and an integer k, and the problem is to determine a set of k points in ℜd, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the extremely high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9 + ε)-approximation algorithm. We show that the approximation factor is almost tight, by giving an example for which the algorithm achieves an approximation factor of (9 - ε). To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, tiffs heuristic performs quite well in practice.

Original languageEnglish
Pages10-18
Number of pages9
DOIs
StatePublished - 2002
Externally publishedYes
EventProceedings of the 18th Annual Symposium on Computational Geometry (SCG'02) - Barcelona, Spain
Duration: 5 Jun 20027 Jun 2002

Conference

ConferenceProceedings of the 18th Annual Symposium on Computational Geometry (SCG'02)
Country/TerritorySpain
CityBarcelona
Period5/06/027/06/02

Keywords

  • Approximation algorithms
  • Clustering
  • Computational geometry
  • Local search
  • k-means

Fingerprint

Dive into the research topics of 'A local search approximation algorithm for k-means clustering'. Together they form a unique fingerprint.

Cite this