INFERENCE FOR HETEROSKEDASTIC PCA WITH MISSING DATA

Yuling Yan, Yuxin Chen, Jianqing Fan

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

This paper studies how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly underexplored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a novel approach to performing valid inference on the principal subspace, on the basis of an estimator called HeteroPCA (Ann. Statist. 50 (2022b) 53–80). We develop nonasymptotic distributional guarantees for HeteroPCA, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Our inference procedures are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels.

Original languageEnglish (US)
Pages (from-to)729-756
Number of pages28
JournalAnnals of Statistics
Volume52
Issue number2
DOIs
StatePublished - Apr 2024
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Principal component analysis
  • confidence regions
  • heteroskedastic data
  • missing data
  • subspace estimation
  • uncertainty quantification

Fingerprint

Dive into the research topics of 'INFERENCE FOR HETEROSKEDASTIC PCA WITH MISSING DATA'. Together they form a unique fingerprint.

Cite this