Variance reduction for faster non-convex optimization

Zevuan Allen-Zhu, Elad Hazan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

101 Scopus citations

Abstract

We consider the fundamental problem in nonconvex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order nonconvex optimization remain to be full gradient descent that converges in 0(1/∈) iterations for smooth objectives, and stochastic gradient descent that converges in 0(1/∈2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an 0(1/∈) rate, and is faster than full gradient descent by Ω(n1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.

Original languageEnglish (US)
Title of host publication33rd International Conference on Machine Learning, ICML 2016
EditorsMaria Florina Balcan, Kilian Q. Weinberger
PublisherInternational Machine Learning Society (IMLS)
Pages1093-1101
Number of pages9
ISBN (Electronic)9781510829008
StatePublished - 2016
Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
Duration: Jun 19 2016Jun 24 2016

Publication series

Name33rd International Conference on Machine Learning, ICML 2016
Volume2

Other

Other33rd International Conference on Machine Learning, ICML 2016
Country/TerritoryUnited States
CityNew York City
Period6/19/166/24/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Variance reduction for faster non-convex optimization'. Together they form a unique fingerprint.

Cite this