A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics

E. Weinan, Chao Ma, Lei Wu

Research output: Contribution to journalArticlepeer-review

42 Scopus citations

Abstract

A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels. In addition, it is proved that throughout the training process the functions represented by the neural network model are uniformly close to that of a kernel method. For general values of the network width and training data size, sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space.

Original languageEnglish (US)
Pages (from-to)1235-1258
Number of pages24
JournalScience China Mathematics
Volume63
Issue number7
DOIs
StatePublished - Jul 1 2020

All Science Journal Classification (ASJC) codes

  • General Mathematics

Keywords

  • 41A99
  • 49M99
  • Gram matrix
  • generalization error
  • random feature model
  • two-layer neural network

Fingerprint

Dive into the research topics of 'A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics'. Together they form a unique fingerprint.

Cite this