TY - JOUR
T1 - A new perspective on robust M-estimation
T2 - Finite sample theory and applications to dependence-adjusted multiple testing
AU - Zhou, Wen Xin
AU - Bose, Koushiki
AU - Fan, Jianqing
AU - Liu, Han
N1 - Funding Information:
Received June 2016; revised April 2017. 1Supported in part by NIH Grant R01-GM072611. 2Supported in part by NSF Grant DGE-1148900. 3Corresponding author. Supported in part by NIH Grant R01-GM072611, NSF Grant DMS-1206464, and Science and Technology Commission of Shanghai Municipality under contract number 16JC1402600. 4Supported by NSF Grants DMS-1454377, IIS-1332109, IIS-1408910 and IIS-1546482, and NIH Grants R01-MH102339 and R01-GM083084. MSC2010 subject classifications. Primary 62F03, 62F35; secondary 62J05, 62E17. Key words and phrases. Approximate factor model, Bahadur representation, false discovery proportion, heavy-tailed data, Huber loss, large-scale multiple testing, M-estimator.
Publisher Copyright:
© Institute of Mathematical Statistics, 2018.
PY - 2018/10
Y1 - 2018/10
N2 - Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [Ann. Statist. 1 (1973) 799-821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. In this paper, we develop nonasymptotic concentration results for such an adaptive Huber estimator, namely, the Huber estimator with the tuning parameter adapted to sample size, dimension and the variance of the noise. Specifically, we obtain a sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur representation when noise variables only have finite second moments. The nonasymptotic results further yield two conventional normal approximation results that are of independent interest, the Berry-Esseen inequality and Cramér-type moderate deviation. As an important application to large-scale simultaneous inference, we apply these robust normal approximation results to analyze a dependence-adjusted multiple testing procedure for moderately heavy-tailed data. It is shown that the robust dependence-adjusted procedure asymptotically controls the overall false discovery proportion at the nominal level under mild moment conditions. Thorough numerical results on both simulated and real datasets are also provided to back up our theory.
AB - Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled by a single grossly outlying observation. As argued in the seminal work of Peter Huber in 1973 [Ann. Statist. 1 (1973) 799-821], robust alternatives to the method of least squares are sorely needed. To achieve robustness against heavy-tailed sampling distributions, we revisit the Huber estimator from a new perspective by letting the tuning parameter involved diverge with the sample size. In this paper, we develop nonasymptotic concentration results for such an adaptive Huber estimator, namely, the Huber estimator with the tuning parameter adapted to sample size, dimension and the variance of the noise. Specifically, we obtain a sub-Gaussian-type deviation inequality and a nonasymptotic Bahadur representation when noise variables only have finite second moments. The nonasymptotic results further yield two conventional normal approximation results that are of independent interest, the Berry-Esseen inequality and Cramér-type moderate deviation. As an important application to large-scale simultaneous inference, we apply these robust normal approximation results to analyze a dependence-adjusted multiple testing procedure for moderately heavy-tailed data. It is shown that the robust dependence-adjusted procedure asymptotically controls the overall false discovery proportion at the nominal level under mild moment conditions. Thorough numerical results on both simulated and real datasets are also provided to back up our theory.
KW - Approximate factor model
KW - Bahadur representation
KW - False discovery proportion
KW - Heavy-tailed data
KW - Huber loss
KW - Large-scale multiple testing
KW - M-estimator
UR - http://www.scopus.com/inward/record.url?scp=85052617019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052617019&partnerID=8YFLogxK
U2 - 10.1214/17-AOS1606
DO - 10.1214/17-AOS1606
M3 - Article
C2 - 30220745
AN - SCOPUS:85052617019
SN - 0090-5364
VL - 46
SP - 1904
EP - 1931
JO - Annals of Statistics
JF - Annals of Statistics
IS - 5
ER -