Abstract
Deep learning models can be efficiently optimized via stochastic gradient descent, but there is little theoretical evidence to support this. A key question in optimization is to understand when the optimization landscape of a neural network is amenable to gradient-based optimization. We focus on a simple neural network two-layer ReLU network with two hidden units, and show that all local minimizers are global. This combined with recent work of Lee et al. (2017); Lee et al. (2016) show that gradient descent converges to the global minimizer.
Original language | English (US) |
---|---|
State | Published - Jan 1 2018 |
Externally published | Yes |
Event | 6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada Duration: Apr 30 2018 → May 3 2018 |
Conference
Conference | 6th International Conference on Learning Representations, ICLR 2018 |
---|---|
Country/Territory | Canada |
City | Vancouver |
Period | 4/30/18 → 5/3/18 |
All Science Journal Classification (ASJC) codes
- Education
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics