Adaptive subgradient methods for online learning and stochastic optimization

John Duchi, Elad Hazan, Yoram Singer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

190 Scopus citations

Abstract

We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. The adaptation, in essence, allows us to find needles in haystacks in the form of very predictive yet rarely observed features. Our paradigm stems from recent advances in online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies the task of setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We corroborate our theoretical results with experiments on a text classification task, showing substantial improvements for classification with sparse datasets.

Original languageEnglish (US)
Title of host publicationCOLT 2010 - The 23rd Conference on Learning Theory
Pages257-269
Number of pages13
StatePublished - 2010
Event23rd Conference on Learning Theory, COLT 2010 - Haifa, Israel
Duration: Jun 27 2010Jun 29 2010

Publication series

NameCOLT 2010 - The 23rd Conference on Learning Theory

Other

Other23rd Conference on Learning Theory, COLT 2010
CountryIsrael
CityHaifa
Period6/27/106/29/10

All Science Journal Classification (ASJC) codes

  • Education

Fingerprint Dive into the research topics of 'Adaptive subgradient methods for online learning and stochastic optimization'. Together they form a unique fingerprint.

Cite this