TY - JOUR

T1 - The Analysis of a Simple k-Means Clustering Algorithm

AU - Kanungo, Tapas

AU - Mount, David M.

AU - Netanyahu, Nathan S.

AU - Piatko, Christine

AU - Silverman, Ruth

AU - Wu, Angela Y.

PY - 2002

Y1 - 2002

N2 - Descriptive note: Technical rept..
K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R(exp d) and an integer k, the problem is to determine a set of k points R(exp d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from applications in color quantization, compression, and segmentation.

AB - Descriptive note: Technical rept..
K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R(exp d) and an integer k, the problem is to determine a set of k points R(exp d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from applications in color quantization, compression, and segmentation.

UR - http://primoprd.tau.ac.il:1701/primo_library/libweb/action/search.do?fn=search&ct=search&initialSearch=true&mode=Basic&tab=default_tab&indx=1&dum=true&srt=rank&vid=TAU1&frbg=&tb=t&vl%28freeText0%29=The+analysis+of+a+simple+k-means+clustering+algorithm&scp

M3 - Article

SN - 0162-8828

VL - 24

SP - 881

EP - 892

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 7

ER -