Pruning nearest neighbor cluster trees

Samory Kpotufe, Ulrike Von Luxburg

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Scopus citations

Abstract

Nearest neighbor (k-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability? Our first contribution is a statistical analysis that reveals how certain subgraphs of a k-NN graph form a consistent estimator of the cluster tree of the underlying distribution of points. Our second and perhaps most important contribution is the following finite sample guarantee. We carefully work out the tradeoff between aggressive and conservative pruning and are able to guarantee the removal of all spurious cluster structures at all levels of the tree while at the same time guaranteeing the recovery of salient clusters. This is the first such finite sample result in the context of clustering.

Original languageEnglish (US)
Title of host publicationProceedings of the 28th International Conference on Machine Learning, ICML 2011
Pages225-232
Number of pages8
StatePublished - 2011
Event28th International Conference on Machine Learning, ICML 2011 - Bellevue, WA, United States
Duration: Jun 28 2011Jul 2 2011

Publication series

NameProceedings of the 28th International Conference on Machine Learning, ICML 2011

Other

Other28th International Conference on Machine Learning, ICML 2011
Country/TerritoryUnited States
CityBellevue, WA
Period6/28/117/2/11

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Human-Computer Interaction
  • Education

Fingerprint

Dive into the research topics of 'Pruning nearest neighbor cluster trees'. Together they form a unique fingerprint.

Cite this