Diffusion maps - A probabilistic interpretation for spectral embedding and clustering algorithms

Boaz Nadler, Stephane Lafon, Ronald Coifman, Ioannis G. Kevrekidis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Scopus citations

Abstract

Spectral embedding and spectral clustering are common methods for non-linear dimensionality reduction and clustering of complex high dimensional datasets. In this paper we provide a diffusion based probabilistic analysis of algorithms that use the normalized graph Laplacian. Given the pairwise adjacency matrix of all points in a dataset, we define a random walk on the graph of points and a diffusion distance between any two points. We show that the diffusion distance is equal to the Euclidean distance in the embedded space with all eigenvectors of the normalized graph Laplacian. This identity shows that characteristic relaxation times and processes of the random walk on the graph are the key concept that governs the properties of these spectral clustering and spectral embedding algorithms. Specifically, for spectral clustering to succeed, a necessary condition is that the mean exit times from each cluster need to be significantly larger than the largest (slowest) of all relaxation times inside all of the individual clusters. For complex, multiscale data, this condition may not hold and multiscale methods need to be developed to handle such situations.

Original languageEnglish (US)
Title of host publicationPrincipal Manifolds for Data Visualization and Dimension Reduction
Pages238-260
Number of pages23
DOIs
StatePublished - 2008

Publication series

NameLecture Notes in Computational Science and Engineering
Volume58
ISSN (Print)1439-7358

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • General Engineering
  • Discrete Mathematics and Combinatorics
  • Control and Optimization
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Diffusion maps - A probabilistic interpretation for spectral embedding and clustering algorithms'. Together they form a unique fingerprint.

Cite this