Distributed testing and estimation under sparse high dimensional models

Heather Battey, Jianqing Fan, Han Liu, Junwei Lu, Ziwei Zhu

Research output: Contribution to journalArticlepeer-review

159 Scopus citations

Abstract

This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

Original languageEnglish (US)
Pages (from-to)1352-1382
Number of pages31
JournalAnnals of Statistics
Volume46
Issue number3
DOIs
StatePublished - Jun 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Keywords

  • Debiasing
  • Divide and conquer
  • Massive data
  • Thresholding

Fingerprint

Dive into the research topics of 'Distributed testing and estimation under sparse high dimensional models'. Together they form a unique fingerprint.

Cite this