TY - GEN
T1 - Semiparametric principal component analysis
AU - Han, Fang
AU - Liu, Han
PY - 2012
Y1 - 2012
N2 - We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multi-variate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman's rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012).
AB - We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multi-variate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman's rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012).
UR - http://www.scopus.com/inward/record.url?scp=84877728321&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877728321&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84877728321
SN - 9781627480031
T3 - Advances in Neural Information Processing Systems
SP - 171
EP - 179
BT - Advances in Neural Information Processing Systems 25
T2 - 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Y2 - 3 December 2012 through 6 December 2012
ER -