TY - GEN
T1 - Stability yields a PTAS for k-median and k-means clustering
AU - Awasthi, Pranjal
AU - Blum, Avrim
AU - Sheffet, Or
PY - 2010
Y1 - 2010
N2 - We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k-1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k-1)-means optimal is more expensive than the k-means optimal by a factor 1+α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/∈ and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of nO(1)(k log n)poly(1/∈,1/α). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1+α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.
AB - We consider k-median clustering in finite metric spaces and k-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the k-means problem, Ostrovsky et al. [18] show that if the optimal (k-1)-means clustering of the input is more expensive than the optimal k-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the k-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k-1)-means optimal is more expensive than the k-means optimal by a factor 1+α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the k-means optimal in time polynomial in n and k, and exponential in 1/∈ and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the k-median problem in finite metrics under the analogous assumption as well. For k-means, we in addition give a randomized algorithm with improved running time of nO(1)(k log n)poly(1/∈,1/α). Our technique also obtains a PTAS under the assumption of Balcan et al. [4] that all (1+α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. [4] is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for k-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for k-median we improve the "largeness" condition needed in [4] to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.
UR - http://www.scopus.com/inward/record.url?scp=78751523552&partnerID=8YFLogxK
U2 - 10.1109/FOCS.2010.36
DO - 10.1109/FOCS.2010.36
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:78751523552
SN - 9780769542447
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 309
EP - 318
BT - Proceedings - 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010
PB - IEEE Computer Society
T2 - 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, FOCS 2010
Y2 - 23 October 2010 through 26 October 2010
ER -