Adaptive Huber Regression

Qiang Sun, Wen Xin Zhou, Jianqing Fan

Research output: Contribution to journalArticlepeer-review

152 Scopus citations

Abstract

Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded (Formula presented.) th moment for any (Formula presented.). We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when (Formula presented.), the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime (Formula presented.) and the transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive. Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)254-265
Number of pages12
JournalJournal of the American Statistical Association
Volume115
Issue number529
DOIs
StatePublished - Jan 2 2020
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Adaptive Huber regression
  • Bias and robustness tradeoff
  • Finite-sample inference
  • Heavy-tailed data
  • Nonasymptotic optimality
  • Phase transition

Fingerprint

Dive into the research topics of 'Adaptive Huber Regression'. Together they form a unique fingerprint.

Cite this