Optimization-Based Separations for Neural Networks

Itay Safran, Jason D. Lee

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations


Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known results in which the deeper architecture leverages this advantage into a provable optimization guarantee. We prove that when the data are generated by a distribution with radial symmetry which satisfies some mild assumptions, gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations, and where the hidden layer is held fixed throughout training. By building on and refining existing techniques for approximation lower bounds of neural networks with a single layer of non-linearities, we show that there are d-dimensional radial distributions on the data such that ball indicators cannot be learned efficiently by any algorithm to accuracy better than Ω(d−4), nor by a standard gradient descent implementation to accuracy better than a constant. These results establish what is to the best of our knowledge, the first optimization-based separations where the approximation benefits of the stronger architecture provably manifest in practice. Our proof technique introduces new tools and ideas that may be of independent interest in the theoretical study of both the approximation and optimization of neural networks.

Original languageEnglish (US)
Pages (from-to)3-64
Number of pages62
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event35th Conference on Learning Theory, COLT 2022 - London, United Kingdom
Duration: Jul 2 2022Jul 5 2022

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Optimization-Based Separations for Neural Networks'. Together they form a unique fingerprint.

Cite this