ON THE PROVABLE ADVANTAGE OF UNSUPERVISED PRETRAINING

Jiawei Ge, Shange Tang, Jianqing Fan, Chi Jin

Research output: Contribution to conferencePaperpeer-review

Abstract

Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited-most existing results are restricted to particular methods or approaches for unsupervised pretraining with specialized structural assumptions. This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models Φ and the downstream task is specified by a class of prediction functions Ψ. We consider a natural approach of using Maximum Likelihood Estimation (MLE) for unsupervised pretraining and Empirical Risk Minimization (ERM) for learning downstream tasks. We prove that, under a mild “informative” condition, our algorithm achieves an excess risk of O(pCΦ/m + pCΨ/n) for downstream tasks, where CΦ, CΨ are complexity measures of function classes Φ, Ψ, and m, n are the number of unlabeled and labeled data respectively. Comparing to the baseline of O(pCΦ∘Ψ/n) achieved by performing supervised learning using only the labeled data, our result rigorously shows the benefit of unsupervised pretraining when m ≫ n and CΦ∘Ψ > CΨ. This paper further shows that our generic framework covers a wide range of approaches for unsupervised pretraining, including factor models, Gaussian mixture models, and contrastive learning.

Original languageEnglish (US)
StatePublished - 2024
Event12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria
Duration: May 7 2024May 11 2024

Conference

Conference12th International Conference on Learning Representations, ICLR 2024
Country/TerritoryAustria
CityHybrid, Vienna
Period5/7/245/11/24

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'ON THE PROVABLE ADVANTAGE OF UNSUPERVISED PRETRAINING'. Together they form a unique fingerprint.

Cite this