A local search approximation algorithm for k-means clustering

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu

Research output: Contribution to journalArticlepeer-review

353 Scopus citations

Abstract

In k-means clustering we are given a set of n data points in d-dimensional space R (d) and an integer k, and the problem is to determine a set of k points in R (d), called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved. There are many heuristics that are used in practice, but we know of no bounds on their performance. We consider the question of whether there exists a simple and practical approximation algorithm for k-means clustering. We present a local improvement heuristic based on swapping centers in and out. We prove that this yields a (9+ε)-approximation algorithm. We present an example showing that any approach based on performing a fixed number of swaps achieves an approximation factor of at least (9-ε) in all sufficiently high dimensions. Thus, our approximation factor is almost tight for algorithms based on performing a fixed number of swaps. To establish the practical value of the heuristic, we present an empirical study that shows that, when combined with Lloyd's algorithm, this heuristic performs quite well in practice.

Original languageEnglish
Pages (from-to)89-112
Number of pages24
JournalComputational Geometry: Theory and Applications
Volume28
Issue number2-3 SPEC. ISS.
DOIs
StatePublished - Jun 2004

Bibliographical note

Funding Information:
✩ A preliminary version of this paper appeared in the 18th Annual ACM Symposium on Computational Geometry (SoCG’02), June 2002, Barcelona, Spain, 10–18. * Corresponding author. E-mail addresses: [email protected] (T. Kanungo), [email protected] (D.M. Mount), [email protected] (N.S. Netanyahu), [email protected] (C.D. Piatko), [email protected] (R. Silverman), [email protected] (A.Y. Wu). 1 This material is based upon work supported by the National Science Foundation under Grant No. 0098151.

Funding

✩ A preliminary version of this paper appeared in the 18th Annual ACM Symposium on Computational Geometry (SoCG’02), June 2002, Barcelona, Spain, 10–18. * Corresponding author. E-mail addresses: [email protected] (T. Kanungo), [email protected] (D.M. Mount), [email protected] (N.S. Netanyahu), [email protected] (C.D. Piatko), [email protected] (R. Silverman), [email protected] (A.Y. Wu). 1 This material is based upon work supported by the National Science Foundation under Grant No. 0098151.

FundersFunder number
National Science Foundation0098151

    Keywords

    • Approximation algorithms
    • Clustering
    • Computational geometry
    • Local search
    • k-means

    Fingerprint

    Dive into the research topics of 'A local search approximation algorithm for k-means clustering'. Together they form a unique fingerprint.

    Cite this