TY - GEN

T1 - A sample complexity separation between non-convex and convex meta-learning

AU - Saunshi, Nikunj

AU - Zhang, Yi

AU - Khodak, Mikhail

AU - Arora, Sanjeev

N1 - Publisher Copyright:
Copyright © 2020 by the Authors. All rights reserved.

PY - 2020

Y1 - 2020

N2 - One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of metalearning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile (Nichol et al., 2018) being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is Ω(d), where d is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of O(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

AB - One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of metalearning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile (Nichol et al., 2018) being for convex models. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is Ω(d), where d is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of O(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

UR - http://www.scopus.com/inward/record.url?scp=85105360503&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85105360503&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85105360503

T3 - 37th International Conference on Machine Learning, ICML 2020

SP - 8470

EP - 8479

BT - 37th International Conference on Machine Learning, ICML 2020

A2 - Daume, Hal

A2 - Singh, Aarti

PB - International Machine Learning Society (IMLS)

T2 - 37th International Conference on Machine Learning, ICML 2020

Y2 - 13 July 2020 through 18 July 2020

ER -