Ultrahigh-dimensional variable selection plays an increasingly important role in contemporary scientific discoveries and statistical research. Among others, Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008) 849-911] propose an independent screening framework by ranking the marginal correlations. They showed that the correlation ranking procedure possesses a sure independence screening property within the context of the linear model with Gaussian covariates and responses. In this paper, we propose a more general version of the independent learning with ranking the maximum marginal likelihood estimates or themaximum marginal likelihood itself in generalized linear models. We show that the proposed methods, with Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008) 849-911] as a very special case, also possess the sure screening property with vanishing false selection rate. The conditions under which the independence learning possesses a sure screening is surprisingly simple. This justifies the applicability of such a simple method in a wide spectrum. We quantify explicitly the extent to which the dimensionality can be reduced by independence screening, which depends on the interactions of the covariance matrix of covariates and true parameters. Simulation studies are used to illustrate the utility of the proposed approaches. In addition, we establish an exponential inequality for the quasi-maximum likelihood estimator which is useful for high-dimensional statistical learning.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Generalized linear models
- Independent learning
- Sure independent screening
- Variable selection