Abstract
In this paper we adopt the diffusion approximation perspective to investigate Stochastic Gradient Descent (SGD) for least squares, which allows us to characterize the exact dynamics of the online regression process. As a consequence, we show that SGD achieves the optimal rate of convergence, up to a logarithmic factor. We further show SGD combined with the trajectory average achieves a faster rate, by eliminating the logarithmic factor. We extend SGD to the high dimensional setting by proposing a two-step algorithm: a burn-in step using offline learning and a refinement step using a variant of truncated stochastic gradient descent. Under appropriate assumptions, we show the proposed algorithm produces near optimal sparse estimators. Numerical experiments lend further support to our obtained theory.
Original language | English (US) |
---|---|
Pages | 1017-1026 |
Number of pages | 10 |
State | Published - Jan 1 2018 |
Event | 21st International Conference on Artificial Intelligence and Statistics, AISTATS 2018 - Playa Blanca, Lanzarote, Canary Islands, Spain Duration: Apr 9 2018 → Apr 11 2018 |
Conference
Conference | 21st International Conference on Artificial Intelligence and Statistics, AISTATS 2018 |
---|---|
Country/Territory | Spain |
City | Playa Blanca, Lanzarote, Canary Islands |
Period | 4/9/18 → 4/11/18 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Artificial Intelligence