TY - JOUR
T1 - On the theory of transfer learning
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
AU - Tripuraneni, Nilesh
AU - Jordan, Michael I.
AU - Jin, Chi
N1 - Funding Information:
The authors thank Yeshwanth Cherapanamjeri for useful discussions. NT thanks the RISELab at U.C. Berkeley for support. In addition, this work was supported by the Army Research Office (ARO)
Funding Information:
under contract W911NF-17-1-0304 as part of the collaboration between US DOD, UK MOD and UK Engineering and Physical Research Council (EPSRC) under the Multidisciplinary University Research Initiative (MURI).
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We provide new statistical guarantees for transfer learning via representation learning–when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider t+ 1 tasks parameterized by functions of the form fj ? h in a general function class F ? H, where each fj is a task-specific function in F and h is the shared representation in H. Letting C(·) denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first t training tasks scales as C(H) + tC(F), despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with C(F). Our results depend upon a new general notion of task diversity–applicable to models with general tasks, features, and losses–as well as a novel chain rule for Gaussian complexities. Finally, we exhibit the utility of our general framework in several models of importance in the literature.
AB - We provide new statistical guarantees for transfer learning via representation learning–when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider t+ 1 tasks parameterized by functions of the form fj ? h in a general function class F ? H, where each fj is a task-specific function in F and h is the shared representation in H. Letting C(·) denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first t training tasks scales as C(H) + tC(F), despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with C(F). Our results depend upon a new general notion of task diversity–applicable to models with general tasks, features, and losses–as well as a novel chain rule for Gaussian complexities. Finally, we exhibit the utility of our general framework in several models of importance in the literature.
UR - http://www.scopus.com/inward/record.url?scp=85107865180&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107865180&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85107865180
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 6 December 2020 through 12 December 2020
ER -