FINITE DEPTH AND WIDTH CORRECTIONS TO THE NEURAL TANGENT KERNEL

Boris Hanin, Mihai Nica

Research output: Contribution to conferencePaperpeer-review

42 Scopus citations

Abstract

We prove the precise scaling, at finite depth and width, for the mean and variance of the neural tangent kernel (NTK) in a randomly initialized ReLU network. The standard deviation is exponential in the ratio of network depth to width. Thus, even in the limit of infinite overparameterization, the NTK is not deterministic if depth and width simultaneously tend to infinity. Moreover, we prove that for such deep and wide networks, the NTK has a non-trivial evolution during training by showing that the mean of its first SGD update is also exponential in the ratio of network depth to width. This is sharp contrast to the regime where depth is fixed and network width is very large. Our results suggest that, unlike relatively shallow and wide networks, deep and wide ReLU networks are capable of learning data-dependent features even in the so-called lazy training regime.

Original languageEnglish (US)
StatePublished - 2020
Externally publishedYes
Event8th International Conference on Learning Representations, ICLR 2020 - Addis Ababa, Ethiopia
Duration: Apr 30 2020 → …

Conference

Conference8th International Conference on Learning Representations, ICLR 2020
Country/TerritoryEthiopia
CityAddis Ababa
Period4/30/20 → …

All Science Journal Classification (ASJC) codes

  • Education
  • Linguistics and Language
  • Language and Linguistics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'FINITE DEPTH AND WIDTH CORRECTIONS TO THE NEURAL TANGENT KERNEL'. Together they form a unique fingerprint.

Cite this