Abstract
Many high dimensional classification techniques have been proposed in the literature based on sparse linear discriminant analysis. To use them efficiently, sparsity of linear classifiers is a prerequisite. However, this might not be readily available in many applications, and rotations of data are required to create the sparsity needed. We propose a family of rotations to create the sparsity required. The basic idea is to use the principal components of the sample covariance matrix of the pooled samples and its variants to rotate the data first and then to apply an existing high dimensional classifier. This rotate-and-solve procedure can be combined with any existing classifiers and is robust against the level of sparsity of the true model. We show that these rotations do create the sparsity that is needed for high dimensional classifications and we provide theoretical understanding why such a rotation works empirically. The effectiveness of the method proposed is demonstrated by several simulated and real data examples, and the improvements of our method over some popular high dimensional classification rules are clearly shown.
Original language | English (US) |
---|---|
Pages (from-to) | 827-851 |
Number of pages | 25 |
Journal | Journal of the Royal Statistical Society. Series B: Statistical Methodology |
Volume | 77 |
Issue number | 4 |
DOIs | |
State | Published - Sep 1 2015 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Classification
- Equivariance
- High dimensional data
- Linear discriminant analysis
- Principal components
- Rotate-and-solve procedure