Consistent procedures for cluster tree estimation and pruning

Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike Von Luxburg

Research output: Contribution to journalArticle

23 Scopus citations

Abstract

For a density f on Rd, a high-density cluster is any connected component of {x: f(x) ≥ λ} , for some λ > 0. The set of all high-density clusters forms a hierarchy called the cluster tree of f. We present two procedures for estimating the cluster tree given samples from f. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the k -nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms, which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.

Original languageEnglish (US)
Article number6915900
Pages (from-to)7900-7912
Number of pages13
JournalIEEE Transactions on Information Theory
Volume60
Issue number12
DOIs
StatePublished - Dec 1 2014

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Keywords

  • Clustering algorithms
  • convergence

Fingerprint Dive into the research topics of 'Consistent procedures for cluster tree estimation and pruning'. Together they form a unique fingerprint.

  • Cite this

    Chaudhuri, K., Dasgupta, S., Kpotufe, S., & Von Luxburg, U. (2014). Consistent procedures for cluster tree estimation and pruning. IEEE Transactions on Information Theory, 60(12), 7900-7912. [6915900]. https://doi.org/10.1109/TIT.2014.2361055