TY - JOUR
T1 - High-dimensional factor model and its applications to statistical machine learning
AU - Chen, Zhao
AU - Fan, Jianqing
AU - Wang, Dan Christina
N1 - Publisher Copyright:
© 2020, Science China Press. All right reserved.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - This paper reviews the recent developments on factor model and its applications to statistical machine learning. The factor model reduces the dimensionality of variables, and provides a low-rank plus sparse structure for the high-dimensional covariance matrices. Therefore, it attracts much attention in high-dimensional data analysis, and has been widely applied in many fields of sciences, engineering, humanities and social sciences, including economics, finance, genomics, neuroscience, machine learning, and so on. We elaborate how to use principal component analysis method to extract latent factors, estimate their associated factor loadings, idiosyncratic components, and their associated covariance matrices. These methods have been proven to effectively cope with the challenges of big data, such as high dimensionality, strong dependence, heavy-tailed variables, and heterogeneity. In addition, we also focus on the role of the factor model in dealing with high-dimensional statistical learning problems such as covariance matrix estimation, model selection, multiple testing, and prediction. Finally, we illustrate the innate relationships between factor models and modern machine learning problems through several applications, including network analysis, matrix completion, ranking, and mixture models.
AB - This paper reviews the recent developments on factor model and its applications to statistical machine learning. The factor model reduces the dimensionality of variables, and provides a low-rank plus sparse structure for the high-dimensional covariance matrices. Therefore, it attracts much attention in high-dimensional data analysis, and has been widely applied in many fields of sciences, engineering, humanities and social sciences, including economics, finance, genomics, neuroscience, machine learning, and so on. We elaborate how to use principal component analysis method to extract latent factors, estimate their associated factor loadings, idiosyncratic components, and their associated covariance matrices. These methods have been proven to effectively cope with the challenges of big data, such as high dimensionality, strong dependence, heavy-tailed variables, and heterogeneity. In addition, we also focus on the role of the factor model in dealing with high-dimensional statistical learning problems such as covariance matrix estimation, model selection, multiple testing, and prediction. Finally, we illustrate the innate relationships between factor models and modern machine learning problems through several applications, including network analysis, matrix completion, ranking, and mixture models.
KW - Factor model
KW - Factor-adjusted method
KW - Model selection
KW - Multiple testing
KW - PCA
KW - Structural covariance matrix
UR - http://www.scopus.com/inward/record.url?scp=85095609311&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095609311&partnerID=8YFLogxK
U2 - 10.1360/SSM-2020-0041
DO - 10.1360/SSM-2020-0041
M3 - Review article
AN - SCOPUS:85095609311
SN - 1674-7216
VL - 50
SP - 447
EP - 490
JO - Scientia Sinica Mathematica
JF - Scientia Sinica Mathematica
IS - 4
ER -