An efficient k-means clustering algorithms: Analysis and implementation

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu

Research output: Contribution to journalArticlepeer-review

4493 Scopus citations

Abstract

In k-means clustering, we are given a set of n data points in d-dimensional space R d and an integer k and the problem is to determine a set of k points in R d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.

Original languageEnglish
Pages (from-to)881-892
Number of pages12
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume24
Issue number7
DOIs
StatePublished - Jul 2002

Bibliographical note

Funding Information:
The authors would like to thank Azriel Rosenfeld for his comments and Hao Li for his help in running the BIRCH experiments. They also are grateful to the anonymous referees for their many valuable suggestions. This research was funded in part by the US National Science Foundation under Grant CCR-0098151 and by the US Army Research Laboratory and the US Department of Defense under

Funding

The authors would like to thank Azriel Rosenfeld for his comments and Hao Li for his help in running the BIRCH experiments. They also are grateful to the anonymous referees for their many valuable suggestions. This research was funded in part by the US National Science Foundation under Grant CCR-0098151 and by the US Army Research Laboratory and the US Department of Defense under

FundersFunder number
US Department of Defense
US National Science FoundationCCR-0098151
Army Research Laboratory

    Keywords

    • Computational geometry
    • Data mining
    • Knowledge discovery
    • Machine learning
    • Nearest-neighbor searching
    • Pattern recognition
    • k-d tree
    • k-means clustering

    Fingerprint

    Dive into the research topics of 'An efficient k-means clustering algorithms: Analysis and implementation'. Together they form a unique fingerprint.

    Cite this