Abstract
We devise a communication-efficient approach to distributed sparse regression in the high-dimensional setting. The key idea is to average "debiased" or "desparsified" lasso estimators. We show the approach converges at the same rate as the lasso as long as the dataset is not split across too many machines, and consistently estimates the support under weaker conditions than the lasso. On the computational side, we propose a new parallel and computationally-efficient algorithm to compute the approximate inverse covariance required in the debiasing approach, when the dataset is split across samples. We further extend the approach to generalized linear models.
Original language | English (US) |
---|---|
Pages (from-to) | 1-30 |
Number of pages | 30 |
Journal | Journal of Machine Learning Research |
Volume | 18 |
State | Published - Jan 1 2017 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Software
- Statistics and Probability
- Artificial Intelligence
Keywords
- Averaging
- Debiasing
- Distributed sparse regression
- High-dimensional statistics
- Lasso