TY - CONF
T1 - HARNESSING THE POWER OF INFINITELY WIDE DEEP NETS ON SMALL-DATA TASKS
AU - Arora, Sanjeev
AU - Du, Simon S.
AU - Li, Zhiyuan
AU - Salakhutdinov, Ruslan
AU - Wang, Ruosong
AU - Yu, Dingli
N1 - Funding Information:
S. Arora, Z. Li and D. Yu are supported by NSF, ONR, Simons Foundation, Schmidt Foundation, Amazon Research, DARPA and SRC. S. S. Du is supported by National Science Foundation (Grant No. DMS-1638352) and the Infosys Membership. R. Salakhutdinov and R. Wang are supported in part by NSF IIS-1763562, AFRL CogDeCON FA875018C0014, and DARPA SAGAMORE HR00111990016. Part of this work was done while S. S. Du was visiting Google Brain Princeton and R. Wang was visiting Princeton University. The authors would like to thank Amazon Web Services for providing compute time for the experiments in this paper, and NVIDIA for GPU support. We thank Priya Goyal for providing experiment details of Goyal et al. (2019). We thank Xiaolong Wang for discussing the few-shot learning task.
Publisher Copyright:
© 2020 8th International Conference on Learning Representations, ICLR 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under ℓ2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called Neural Tangent Kernels (NTKs) (Jacot et al., 2018). An efficient algorithm to compute the NTK, as well as its convolutional counterparts, appears in Arora et al. (2019a), which allowed studying performance of infinitely wide nets on datasets like CIFAR-10. However, super-quadratic running time of kernel methods makes them best suited for small-data tasks. We report results suggesting neural tangent kernels perform strongly on low-data tasks. 1. On a standard testbed of classification/regression tasks from the UCI database, NTK SVM beats the previous gold standard, Random Forests (RF), and also the corresponding finite nets. 2. On CIFAR-10 with 10-640 training samples, Convolutional NTK consistently beats ResNet-34 by 1% - 3%. 3. On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance. 4. Comparing the performance of NTK with the finite-width net it was derived from, NTK behavior starts at lower net widths than suggested by theoretical analysis(Arora et al., 2019a). NTK's efficacy may trace to lower variance of output.
AB - Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under ℓ2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called Neural Tangent Kernels (NTKs) (Jacot et al., 2018). An efficient algorithm to compute the NTK, as well as its convolutional counterparts, appears in Arora et al. (2019a), which allowed studying performance of infinitely wide nets on datasets like CIFAR-10. However, super-quadratic running time of kernel methods makes them best suited for small-data tasks. We report results suggesting neural tangent kernels perform strongly on low-data tasks. 1. On a standard testbed of classification/regression tasks from the UCI database, NTK SVM beats the previous gold standard, Random Forests (RF), and also the corresponding finite nets. 2. On CIFAR-10 with 10-640 training samples, Convolutional NTK consistently beats ResNet-34 by 1% - 3%. 3. On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance. 4. Comparing the performance of NTK with the finite-width net it was derived from, NTK behavior starts at lower net widths than suggested by theoretical analysis(Arora et al., 2019a). NTK's efficacy may trace to lower variance of output.
UR - http://www.scopus.com/inward/record.url?scp=85150651741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150651741&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85150651741
T2 - 8th International Conference on Learning Representations, ICLR 2020
Y2 - 30 April 2020
ER -