TY - GEN
T1 - A sample complexity separation between non-convex and convex meta-learning
AU - Saunshi, Nikunj
AU - Zhang, Yi
AU - Khodak, Mikhail
AU - Arora, Sanjeev
N1 - Funding Information:
Sanjeev Arora, Nikunj Saunshi and Yi Zhang are supported by NSF, ONR, Simons Foundation, Schmidt Foundation, Amazon Research, DARPA and SRC. Mikhail Khodak is supported in part by NSF grants CCF-1535967, CCF-1910321, IIS-1618714, IIS-1705121, IIS1838017, and IIS-1901403, DARPA FA875017C0141, Amazon Research, Amazon Web Services, Microsoft Research, Bloomberg Data Science, the Okawa Foundation, Google, JP Morgan AI Research, and the Carnegie Bosch Institute.
Publisher Copyright:
Copyright © 2020 by the Authors. All rights reserved.
PY - 2020
Y1 - 2020
N2 - One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of metalearning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile (Nichol et al., 2018) being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is Ω(d), where d is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of O(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
AB - One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of metalearning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile (Nichol et al., 2018) being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is Ω(d), where d is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of O(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
UR - http://www.scopus.com/inward/record.url?scp=85105360503&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105360503&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105360503
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 8470
EP - 8479
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -