TY - JOUR
T1 - The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square
AU - Sur, Pragya
AU - Chen, Yuxin
AU - Candès, Emmanuel J.
N1 - Funding Information:
E. C. was partially supported by the Office of Naval Research under grant N00014-16-1-2712, and by the Math + X Award from the Simons Foundation. P. S. was partially supported by the Ric Weiland Graduate Fellowship in the School of Humanities and Sciences, Stanford University. Y. C. is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, and by the Princeton SEAS innovation award. P. S. and Y. C. are grateful to Andrea Montanari for his help in understanding AMP and [22]. Y. C. thanks Kaizheng Wang and Cong Ma for helpful discussion about [25], and P. S. thanks Subhabrata Sen for several helpful discussions regarding this project. E. C. would like to thank Iain Johnstone for a helpful discussion as well.
Publisher Copyright:
© 2019, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test (LRT). Indeed, Wilks’ theorem asserts that whenever we have a fixed number p of variables, twice the log-likelihood ratio (LLR) 2 Λ is distributed as a χk2 variable in the limit of large sample sizes n; here, χk2 is a Chi-square with k degrees of freedom and k the number of variables being tested. In this paper, we prove that when p is not negligible compared to n, Wilks’ theorem does not hold and that the Chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that n and p grow large in such a way that p/ n→ κ for some constant κ< 1 / 2. (For κ> 1 / 2 , 2 Λ → P0 so that the LRT is not interesting in this regime.) We prove that for a class of logistic models, the LLR converges to a rescaled Chi-square, namely, 2Λ→dα(κ)χk2, where the scaling factor α(κ) is greater than one as soon as the dimensionality ratio κ is positive. Hence, the LLR is larger than classically assumed. For instance, when κ= 0.3 , α(κ) ≈ 1.5. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, from non-asymptotic random matrix theory and from convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
AB - Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test (LRT). Indeed, Wilks’ theorem asserts that whenever we have a fixed number p of variables, twice the log-likelihood ratio (LLR) 2 Λ is distributed as a χk2 variable in the limit of large sample sizes n; here, χk2 is a Chi-square with k degrees of freedom and k the number of variables being tested. In this paper, we prove that when p is not negligible compared to n, Wilks’ theorem does not hold and that the Chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that n and p grow large in such a way that p/ n→ κ for some constant κ< 1 / 2. (For κ> 1 / 2 , 2 Λ → P0 so that the LRT is not interesting in this regime.) We prove that for a class of logistic models, the LLR converges to a rescaled Chi-square, namely, 2Λ→dα(κ)χk2, where the scaling factor α(κ) is greater than one as soon as the dimensionality ratio κ is positive. Hence, the LLR is larger than classically assumed. For instance, when κ= 0.3 , α(κ) ≈ 1.5. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, from non-asymptotic random matrix theory and from convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.
KW - Approximate message passing
KW - Concentration inequalities
KW - Convex geometry
KW - Goodness of fit
KW - High-dimensionality
KW - Leave-one-out analysis
KW - Likelihood-ratio tests
KW - Logistic regression
KW - Wilks’ theorem
UR - http://www.scopus.com/inward/record.url?scp=85060607961&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060607961&partnerID=8YFLogxK
U2 - 10.1007/s00440-018-00896-9
DO - 10.1007/s00440-018-00896-9
M3 - Article
AN - SCOPUS:85060607961
SN - 0178-8051
VL - 175
SP - 487
EP - 558
JO - Probability Theory and Related Fields
JF - Probability Theory and Related Fields
IS - 1-2
ER -