TY - GEN
T1 - Variance reduction for faster non-convex optimization
AU - Allen-Zhu, Zevuan
AU - Hazan, Elad
N1 - Funding Information:
E. Hazan acknowledges support from the National Science Foundation grant IIS-1523815 and a Google research award. Z. Allen-Zhu acknowledges support from a Microsoft research award, no. 0518584.
PY - 2016
Y1 - 2016
N2 - We consider the fundamental problem in nonconvex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order nonconvex optimization remain to be full gradient descent that converges in 0(1/∈) iterations for smooth objectives, and stochastic gradient descent that converges in 0(1/∈2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an 0(1/∈) rate, and is faster than full gradient descent by Ω(n1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.
AB - We consider the fundamental problem in nonconvex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order nonconvex optimization remain to be full gradient descent that converges in 0(1/∈) iterations for smooth objectives, and stochastic gradient descent that converges in 0(1/∈2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an 0(1/∈) rate, and is faster than full gradient descent by Ω(n1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.
UR - http://www.scopus.com/inward/record.url?scp=84999029527&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84999029527&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84999029527
T3 - 33rd International Conference on Machine Learning, ICML 2016
SP - 1093
EP - 1101
BT - 33rd International Conference on Machine Learning, ICML 2016
A2 - Balcan, Maria Florina
A2 - Weinberger, Kilian Q.
PB - International Machine Learning Society (IMLS)
T2 - 33rd International Conference on Machine Learning, ICML 2016
Y2 - 19 June 2016 through 24 June 2016
ER -