Testing of clustering

Noga Alon, Seannie Dar, Michal Parnas, Dana Ron

Research output: Contribution to journalConference article

30 Scopus citations

Abstract

A set X of points in Rd is (k, b)-clusterable if X can be partitioned into k subsets (clusters) so that the diameter (alternatively, the radius) of each cluster is at most b. We present algorithms that by sampling from a set X, distinguish between the case that X is (k, b)-clusterable and the case that X is ε-far from being (k, b′)-clusterable for any given 0<ε≤1 and for b′≥b. In ε-far from being (k, b′)-clusterable we mean that more than ε·|X| points should be removed from X so that it becomes (k, b′)-clusterable. We give algorithms for a variety of cost measures that use a sample of size independent of |X|, and polynomial in k and 1/ε. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an ε-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|. That is, without actually having to partition all points in X, the implicit representation can be used to answer queries concerning the cluster any given point belongs to.

Original languageEnglish (US)
Pages (from-to)240-250
Number of pages11
JournalAnnual Symposium on Foundations of Computer Science - Proceedings
StatePublished - Dec 1 2000
Externally publishedYes
Event41st Annual Symposium on Foundations of Computer Science (FOCS 2000) - Redondo Beach, CA, USA
Duration: Nov 12 2000Nov 14 2000

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Testing of clustering'. Together they form a unique fingerprint.

  • Cite this