TY - JOUR
T1 - Adaptive Huber Regression
AU - Sun, Qiang
AU - Zhou, Wen Xin
AU - Fan, Jianqing
N1 - Funding Information:
This work is supported by a Connaught Award, NSERC Grant RGPIN-2018-06484, NSF Grants DMS-1662139, DMS-1712591, and DMS-1811376, NIH Grant 2R01-GM072611-14, and NSFC Grant 11690014. The authors thank the editor, associate editor, and two anonymous referees for their valuable comments.
Publisher Copyright:
© 2019, © 2019 American Statistical Association.
PY - 2020/1/2
Y1 - 2020/1/2
N2 - Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (Formula presented.) th moment for any (Formula presented.). We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when (Formula presented.), the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime (Formula presented.) and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive. Supplementary materials for this article are available online.
AB - Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (Formula presented.) th moment for any (Formula presented.). We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when (Formula presented.), the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime (Formula presented.) and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive. Supplementary materials for this article are available online.
KW - Adaptive Huber regression
KW - Bias and robustness tradeoff
KW - Finite-sample inference
KW - Heavy-tailed data
KW - Nonasymptotic optimality
KW - Phase transition
UR - http://www.scopus.com/inward/record.url?scp=85063152344&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063152344&partnerID=8YFLogxK
U2 - 10.1080/01621459.2018.1543124
DO - 10.1080/01621459.2018.1543124
M3 - Article
C2 - 33139964
AN - SCOPUS:85063152344
SN - 0162-1459
VL - 115
SP - 254
EP - 265
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 529
ER -