AN ℓp THEORY OF PCA AND SPECTRAL CLUSTERING

Emmanuel Abbe, Jianqing Fan, Kaizheng Wang

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an ℓp perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel ℓp analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in ℓp norm, which includes ℓ2 and ℓ as special cases. For sub-Gaussian mixture models, the choice of p giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the ℓp theory leads to simple spectral algorithms that achieve the information threshold for exact recovery and the optimal misclassification rate.

Original languageEnglish (US)
Pages (from-to)2359-2385
Number of pages27
JournalAnnals of Statistics
Volume50
Issue number4
DOIs
StatePublished - Aug 2022

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Principal component analysis
  • community detection
  • contextual network models
  • eigenvector perturbation
  • mixture models
  • phase transitions
  • spectral clustering

Fingerprint

Dive into the research topics of 'AN ℓp THEORY OF PCA AND SPECTRAL CLUSTERING'. Together they form a unique fingerprint.

Cite this