Stability yields a PTAS for k-median and k-means clustering

Pranjal Awasthi, Avrim Blum, Or Sheffet

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

65 Scopus citations

Abstract

We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k-1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k-1)-means optimal is more expensive than the k-means optimal by a factor 1+α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/∈ and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of nO(1)(k log n)poly(1/∈,1/α). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1+α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.

Original languageEnglish
Title of host publicationProceedings - 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010
PublisherIEEE Computer Society
Pages309-318
Number of pages10
ISBN (Print)9780769542447
DOIs
StatePublished - 2010
Externally publishedYes
Event2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010 - Las Vegas, NV, United States
Duration: 23 Oct 201026 Oct 2010

Publication series

NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
ISSN (Print)0272-5428

Conference

Conference2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010
Country/TerritoryUnited States
CityLas Vegas, NV
Period23/10/1026/10/10

Fingerprint

Dive into the research topics of 'Stability yields a PTAS for k-median and k-means clustering'. Together they form a unique fingerprint.

Cite this