Analysis of a simple k-means clustering algorithm

Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine Piatko, Ruth Silverman, Angela Y. Wu

Research output: Contribution to conferencePaperpeer-review

94 Scopus citations

Abstract

K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in Rd and an integer k, the problem is to determine a set of k points Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from applications in color quantization, compression, and segmentation.

Original languageEnglish
Pages100-109
Number of pages10
DOIs
StatePublished - 2000
Externally publishedYes
Event16th Annual Symposium on Computational Geometry - Hong Kong, Hong Kong
Duration: 12 Jun 200014 Jun 2000

Conference

Conference16th Annual Symposium on Computational Geometry
CityHong Kong, Hong Kong
Period12/06/0014/06/00

Fingerprint

Dive into the research topics of 'Analysis of a simple k-means clustering algorithm'. Together they form a unique fingerprint.

Cite this