Abstract
Deep learning models can be efficiently optimized via stochastic gradient descent, but there is little theoretical evidence to support this. A key question in optimization is to understand when the optimization landscape of a neural network is amenable to gradient-based optimization. We focus on a simple neural network two-layer ReLU network with two hidden units, and show that all local minimizers are global. This combined with recent work of Lee et al. (2017); Lee et al. (2016) show that gradient descent converges to the global minimizer.
| Original language | English (US) |
|---|---|
| State | Published - Jan 1 2018 |
| Externally published | Yes |
| Event | 6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada Duration: Apr 30 2018 → May 3 2018 |
Conference
| Conference | 6th International Conference on Learning Representations, ICLR 2018 |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 4/30/18 → 5/3/18 |
All Science Journal Classification (ASJC) codes
- Education
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics