How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective

Lei Wu, Chao Ma, E. Weinan

Research output: Contribution to journalConference articlepeer-review

100 Scopus citations

Abstract

The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.

Original languageEnglish (US)
Pages (from-to)8279-8288
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective'. Together they form a unique fingerprint.

Cite this