TY - JOUR
T1 - How SGD selects the global minima in over-parameterized learning
T2 - 32nd Conference on Neural Information Processing Systems, NeurIPS 2018
AU - Wu, Lei
AU - Ma, Chao
AU - Weinan, E.
N1 - Funding Information:
We are grateful to Zhanxing Zhu for very helpful discussions. The worked performed here is supported in part by ONR grant N00014-13-1-0338 and the Major Program of NNSFC under grant 91130005.
Publisher Copyright:
© 2018 Curran Associates Inc.All rights reserved.
PY - 2018
Y1 - 2018
N2 - The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.
AB - The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.
UR - http://www.scopus.com/inward/record.url?scp=85064847800&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064847800&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85064847800
SN - 1049-5258
VL - 2018-December
SP - 8279
EP - 8288
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 2 December 2018 through 8 December 2018
ER -