Altitude training: Strong bounds for single-layer dropout

Stefan Wager, William Fithian, Sida Wang, Percy Liang

Research output: Contribution to journalConference articlepeer-review

29 Scopus citations

Abstract

Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.

Original languageEnglish (US)
Pages (from-to)100-108
Number of pages9
JournalAdvances in Neural Information Processing Systems
Volume1
Issue numberJanuary
StatePublished - 2014
Event28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada
Duration: Dec 8 2014Dec 13 2014

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Altitude training: Strong bounds for single-layer dropout'. Together they form a unique fingerprint.

Cite this