Abstract
For a density f on Rd, a high-density cluster is any connected component of {x: f(x) ≥ λ} , for some λ > 0. The set of all high-density clusters forms a hierarchy called the cluster tree of f. We present two procedures for estimating the cluster tree given samples from f. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the k -nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms, which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.
Original language | English (US) |
---|---|
Article number | 6915900 |
Pages (from-to) | 7900-7912 |
Number of pages | 13 |
Journal | IEEE Transactions on Information Theory |
Volume | 60 |
Issue number | 12 |
DOIs | |
State | Published - Dec 1 2014 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Keywords
- Clustering algorithms
- convergence