TY - JOUR
T1 - Beyond lazy training for over-parameterized tensor decomposition
AU - Wang, Xiang
AU - Wu, Chenwei
AU - Lee, Jason D.
AU - Ma, Tengyu
AU - Ge, Rong
N1 - Funding Information:
RG acknowledges support from NSF CCF1704656, NSF CCF-1845171 (CAREER), NSF CCF-1934964 (TRIPODS), NSF-Simons Research Collaborations on the Mathematical and Scientific Foundations of Deep Learning, a Sloan Fellowship, and a Google Faculty Research Award.
Funding Information:
JDL acknowledges support of the ARO under MURI Award W911NF-11-1-0303, the Sloan Research Fellowship, and NSF CCF 2002272. This is part of the collaboration between US DOD, UK MOD and UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative.
Funding Information:
TM acknowledges support of Google Faculty Award. The work is also partially supported by SDSI and SAIL at Stanford.
Funding Information:
RG acknowledges support from NSF CCF1704656, NSF CCF-1845171 (CAREER), NSF CCF-1934964 (TRIPODS), NSF-Simons Research Collaborations on the Mathematical and Scientific Foundations of Deep Learning, a Sloan Fellowship, and a Google Faculty Research Award. JDL acknowledges support of the ARO under MURI Award W911NF-11-1-0303, the Sloan Research Fellowship, and NSF CCF 2002272. This is part of the collaboration between US DOD, UK MOD and UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative. TM acknowledges support of Google Faculty Award. The work is also partially supported by SDSI and SAIL at Stanford.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an l-th order tensor in (Rd)?l of rank r (where r « d), can variants of gradient descent find a rank m decomposition where m > r? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least m = ?(dl-1), while a variant of gradient descent can find an approximate tensor when m = O*(r2.5l log d). Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
AB - Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an l-th order tensor in (Rd)?l of rank r (where r « d), can variants of gradient descent find a rank m decomposition where m > r? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least m = ?(dl-1), while a variant of gradient descent can find an approximate tensor when m = O*(r2.5l log d). Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
UR - http://www.scopus.com/inward/record.url?scp=85108404512&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108404512&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85108404512
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -