Abstract
In k-means clustering, we are given a set of n data points in d-dimensional space R d and an integer k and the problem is to determine a set of k points in R d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which allows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.
Original language | English |
---|---|
Pages (from-to) | 881-892 |
Number of pages | 12 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 24 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2002 |
Bibliographical note
Funding Information:The authors would like to thank Azriel Rosenfeld for his comments and Hao Li for his help in running the BIRCH experiments. They also are grateful to the anonymous referees for their many valuable suggestions. This research was funded in part by the US National Science Foundation under Grant CCR-0098151 and by the US Army Research Laboratory and the US Department of Defense under
Funding
The authors would like to thank Azriel Rosenfeld for his comments and Hao Li for his help in running the BIRCH experiments. They also are grateful to the anonymous referees for their many valuable suggestions. This research was funded in part by the US National Science Foundation under Grant CCR-0098151 and by the US Army Research Laboratory and the US Department of Defense under
Funders | Funder number |
---|---|
US Department of Defense | |
US National Science Foundation | CCR-0098151 |
Army Research Laboratory |
Keywords
- Computational geometry
- Data mining
- Knowledge discovery
- Machine learning
- Nearest-neighbor searching
- Pattern recognition
- k-d tree
- k-means clustering