Skip to main navigation Skip to search Skip to main content

Gradient descent can take exponential time to Escape saddle points

  • Simon S. Du
  • , Chi Jin
  • , Jason D. Lee
  • , Michael I. Jordan
  • , Barnabás Póczos
  • , Aarti Singh

Research output: Contribution to journalConference articlepeer-review

Abstract

Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points-it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

Original languageEnglish (US)
Pages (from-to)1068-1078
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - 2017
Externally publishedYes
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: Dec 4 2017Dec 9 2017

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Gradient descent can take exponential time to Escape saddle points'. Together they form a unique fingerprint.

Cite this