On the Power of Over-parametrization in Neural Networks with Quadratic Activation

Simon S. Du, Jason D. Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

48 Scopus citations

Abstract

We provide new theoretical insights on why over- parametrization is effective in learning neural networks. For a k hidden node shallow network with quadratic activation and n training data points, we show as long as k > y/2n, over-parametrization enables local search algorithms to find a globally optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Radcmacher complexity, wc show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when k > y/2n, the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks.i.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsAndreas Krause, Jennifer Dy
PublisherInternational Machine Learning Society (IMLS)
Pages2132-2141
Number of pages10
ISBN (Electronic)9781510867963
StatePublished - 2018
Externally publishedYes
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume3

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period7/10/187/15/18

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'On the Power of Over-parametrization in Neural Networks with Quadratic Activation'. Together they form a unique fingerprint.

Cite this