Gradient descent finds global minima of deep neural networks

Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

187 Scopus citations

Abstract

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep overparameterized neural network with residual connections (RcsNct). Our analysis relics on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. Wc further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

Original languageEnglish (US)
Title of host publication36th International Conference on Machine Learning, ICML 2019
PublisherInternational Machine Learning Society (IMLS)
Pages3003-3048
Number of pages46
ISBN (Electronic)9781510886988
StatePublished - 2019
Externally publishedYes
Event36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States
Duration: Jun 9 2019Jun 15 2019

Publication series

Name36th International Conference on Machine Learning, ICML 2019
Volume2019-June

Conference

Conference36th International Conference on Machine Learning, ICML 2019
Country/TerritoryUnited States
CityLong Beach
Period6/9/196/15/19

All Science Journal Classification (ASJC) codes

  • Education
  • Computer Science Applications
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Gradient descent finds global minima of deep neural networks'. Together they form a unique fingerprint.

Cite this