TY - GEN
T1 - Diffusion maps - A probabilistic interpretation for spectral embedding and clustering algorithms
AU - Nadler, Boaz
AU - Lafon, Stephane
AU - Coifman, Ronald
AU - Kevrekidis, Ioannis G.
PY - 2008
Y1 - 2008
N2 - Spectral embedding and spectral clustering are common methods for non-linear dimensionality reduction and clustering of complex high dimensional datasets. In this paper we provide a diffusion based probabilistic analysis of algorithms that use the normalized graph Laplacian. Given the pairwise adjacency matrix of all points in a dataset, we define a random walk on the graph of points and a diffusion distance between any two points. We show that the diffusion distance is equal to the Euclidean distance in the embedded space with all eigenvectors of the normalized graph Laplacian. This identity shows that characteristic relaxation times and processes of the random walk on the graph are the key concept that governs the properties of these spectral clustering and spectral embedding algorithms. Specifically, for spectral clustering to succeed, a necessary condition is that the mean exit times from each cluster need to be significantly larger than the largest (slowest) of all relaxation times inside all of the individual clusters. For complex, multiscale data, this condition may not hold and multiscale methods need to be developed to handle such situations.
AB - Spectral embedding and spectral clustering are common methods for non-linear dimensionality reduction and clustering of complex high dimensional datasets. In this paper we provide a diffusion based probabilistic analysis of algorithms that use the normalized graph Laplacian. Given the pairwise adjacency matrix of all points in a dataset, we define a random walk on the graph of points and a diffusion distance between any two points. We show that the diffusion distance is equal to the Euclidean distance in the embedded space with all eigenvectors of the normalized graph Laplacian. This identity shows that characteristic relaxation times and processes of the random walk on the graph are the key concept that governs the properties of these spectral clustering and spectral embedding algorithms. Specifically, for spectral clustering to succeed, a necessary condition is that the mean exit times from each cluster need to be significantly larger than the largest (slowest) of all relaxation times inside all of the individual clusters. For complex, multiscale data, this condition may not hold and multiscale methods need to be developed to handle such situations.
UR - http://www.scopus.com/inward/record.url?scp=77952984416&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952984416&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73750-6_10
DO - 10.1007/978-3-540-73750-6_10
M3 - Conference contribution
AN - SCOPUS:77952984416
SN - 9783540737490
T3 - Lecture Notes in Computational Science and Engineering
SP - 238
EP - 260
BT - Principal Manifolds for Data Visualization and Dimension Reduction
ER -