TY - CONF
T1 - BEYOND LINEARIZATION
T2 - 8th International Conference on Learning Representations, ICLR 2020
AU - Bai, Yu
AU - Lee, Jason D.
N1 - Funding Information:
The authors would like to thank Wei Hu, Tengyu Ma, Song Mei, and Andrea Montanari for their insightful comments. JDL acknowledges support of the ARO under MURI Award W911NF-11-1-0303, the Sloan Research Fellowship, and NSF CCF #1900145. The authors also thank the Simons Institute Summer 2019 program on the Foundations of Deep Learning, and the Institute of Advanced Studies Special Year on Optimization, Statistics, and Theoretical Machine Learning for hosting the authors.
Publisher Copyright:
© 2020 8th International Conference on Learning Representations, ICLR 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by the Neural Tangent Kernels (NTKs). NTK theory leads to concrete convergence and generalization results, yet the empirical performance of neural networks are observed to exceed their linearized models, suggesting insufficiency of this theory. Towards closing this gap, we investigate the training of over-parametrized neural networks that are beyond the NTK regime yet still governed by the Taylor expansion of the network. We bring forward the idea of randomizing the neural networks, which allows them to escape their NTK and couple with quadratic models. We show that the optimization landscape of randomized two-layer networks are nice and amenable to escaping-saddle algorithms. We prove concrete generalization and expressivity results on these randomized networks, which lead to sample complexity bounds (of learning certain simple functions) that match the NTK and can in addition be better by a dimension factor when mild distributional assumptions are present. We demonstrate that our randomization technique can be generalized systematically beyond the quadratic case, by using it to find networks that are coupled with higher-order terms in their Taylor series.
AB - Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by the Neural Tangent Kernels (NTKs). NTK theory leads to concrete convergence and generalization results, yet the empirical performance of neural networks are observed to exceed their linearized models, suggesting insufficiency of this theory. Towards closing this gap, we investigate the training of over-parametrized neural networks that are beyond the NTK regime yet still governed by the Taylor expansion of the network. We bring forward the idea of randomizing the neural networks, which allows them to escape their NTK and couple with quadratic models. We show that the optimization landscape of randomized two-layer networks are nice and amenable to escaping-saddle algorithms. We prove concrete generalization and expressivity results on these randomized networks, which lead to sample complexity bounds (of learning certain simple functions) that match the NTK and can in addition be better by a dimension factor when mild distributional assumptions are present. We demonstrate that our randomization technique can be generalized systematically beyond the quadratic case, by using it to find networks that are coupled with higher-order terms in their Taylor series.
UR - http://www.scopus.com/inward/record.url?scp=85093385943&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093385943&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85093385943
Y2 - 30 April 2020
ER -