## Abstract

A set X of points in R^{d} is (k, b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k, b)-clusterable and the case that X is ε-far from being (k, b′)-clusterable for any given 0<ε≤1 and for b′≥b. In ε-far from being (k, b′)-clusterable we mean that more than ε·|X| points should be removed from X so that it becomes (k, b′)-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1/ε. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an ε-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to.

Original language | English (US) |
---|---|

Pages (from-to) | 240-250 |

Number of pages | 11 |

Journal | Annual Symposium on Foundations of Computer Science - Proceedings |

State | Published - Dec 1 2000 |

Externally published | Yes |

Event | 41st Annual Symposium on Foundations of Computer Science (FOCS 2000) - Redondo Beach, CA, USA Duration: Nov 12 2000 → Nov 14 2000 |

## All Science Journal Classification (ASJC) codes

- Hardware and Architecture