PG-TS: Improved Thompson sampling for logistic contextual bandits

Bianca Dumitrascu, Karen Feng, Barbara Engelhardt Martin

Research output: Contribution to journalConference article

1 Scopus citations

Abstract

We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Pólya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Pólya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Pólya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.

Original languageEnglish (US)
Pages (from-to)4624-4633
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - Jan 1 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: Dec 2 2018Dec 8 2018

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint Dive into the research topics of 'PG-TS: Improved Thompson sampling for logistic contextual bandits'. Together they form a unique fingerprint.

  • Cite this