Early Stopping as Nonparametric Variational Inference

Research output: Contribution to conferencePaperpeer-review

67 Scopus citations

Abstract

We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

Original languageEnglish (US)
Pages1070-1077
Number of pages8
StatePublished - 2016
Externally publishedYes
Event19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016 - Cadiz, Spain
Duration: May 9 2016May 11 2016

Conference

Conference19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016
Country/TerritorySpain
CityCadiz
Period5/9/165/11/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Early Stopping as Nonparametric Variational Inference'. Together they form a unique fingerprint.

Cite this