Abstract
We study efficient algorithms for the Euclidean k-Center problem, focusing on the regime of large k. We take the approach of data reduction by considering α-coreset, which is a small subset S of the dataset P such that any β-approximation on S is an (α + β)-approximation on P. We give efficient algorithms to construct coresets whose size is k · o(n), which immediately speeds up existing approximation algorithms. Notably, we obtain a near-linear time O(1)-approximation when k = nc for any 0 < c < 1. We validate the performance of our coresets on real-world datasets with large k, and we observe that the coreset speeds up the well-known Gonzalez algorithm by up to 4 times, while still achieving similar clustering cost. Technically, one of our coreset results is based on a new efficient construction of consistent hashing with competitive parameters. This general tool may be of independent interest for algorithm design in high dimensional Euclidean spaces.
| Original language | English |
|---|---|
| Pages (from-to) | 17189-17202 |
| Number of pages | 14 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 267 |
| State | Published - 2025 |
| Event | 42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada Duration: 13 Jul 2025 → 19 Jul 2025 |
Bibliographical note
Publisher Copyright:© 2025 by the author(s).