No spurious local minima in a two hidden unit Relu network

Jiajun Luo, Chenwei Wu, Jason D. Lee

Research output: Contribution to conferencePaperpeer-review

Abstract

Deep learning models can be efficiently optimized via stochastic gradient descent, but there is little theoretical evidence to support this. A key question in optimization is to understand when the optimization landscape of a neural network is amenable to gradient-based optimization. We focus on a simple neural network two-layer ReLU network with two hidden units, and show that all local minimizers are global. This combined with recent work of Lee et al. (2017); Lee et al. (2016) show that gradient descent converges to the global minimizer.

Original languageEnglish (US)
StatePublished - Jan 1 2018
Externally publishedYes
Event6th International Conference on Learning Representations, ICLR 2018 - Vancouver, Canada
Duration: Apr 30 2018May 3 2018

Conference

Conference6th International Conference on Learning Representations, ICLR 2018
Country/TerritoryCanada
CityVancouver
Period4/30/185/3/18

All Science Journal Classification (ASJC) codes

  • Education
  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'No spurious local minima in a two hidden unit Relu network'. Together they form a unique fingerprint.

Cite this