Abstract
Factor and sparse models are widely used to impose a low-dimensional structure in high-dimensions. However, they are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows for efficiently exploring all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data with observable and/or latent common factors and idiosyncratic components. The model is called the factor-augmented regression model. It includes principal components and sparse regression as specific models, significantly weakens the cross-sectional dependence, and facilitates model selection and interpretability. The method consists of several steps and a novel test for (partial) covariance structure in high dimensions to infer the remaining cross-section dependence at each step. We develop the theory for the model and demonstrate the validity of the multiplier bootstrap for testing a high-dimensional (partial) covariance structure. A simulation study and applications support the theory.
Original language | English (US) |
---|---|
Pages (from-to) | 1692-1717 |
Number of pages | 26 |
Journal | Annals of Statistics |
Volume | 51 |
Issue number | 4 |
DOIs | |
State | Published - Aug 2023 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- (partial) covariance structure
- Factor models
- high-dimensional inference
- machine learning
- penalized least-squares
- prediction
- supervised learning