TY - JOUR
T1 - Communication-Efficient Accurate Statistical Estimation
AU - Fan, Jianqing
AU - Guo, Yongyi
AU - Wang, Kaizheng
N1 - Funding Information:
We gratefully acknowledge NSF grants DMS-1662139, DMS-1712591, DMS-2053832, DMS-2052926, NIH grant 2R01-GM072611-15, and ONR grant N00014-19-1-2120. We acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20-RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010.
Publisher Copyright:
© 2021 American Statistical Association.
PY - 2021
Y1 - 2021
N2 - When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This article develops and investigates two communication-efficient accurate statistical estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multistep statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.
AB - When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This article develops and investigates two communication-efficient accurate statistical estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multistep statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.
KW - Communication efficiency
KW - Distributed statistical estimation
KW - Multi-round algorithms
KW - Penalized likelihood
UR - http://www.scopus.com/inward/record.url?scp=85115611603&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115611603&partnerID=8YFLogxK
U2 - 10.1080/01621459.2021.1969238
DO - 10.1080/01621459.2021.1969238
M3 - Article
C2 - 37347088
AN - SCOPUS:85115611603
SN - 0162-1459
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
ER -