Context-lumpable stochastic bandits

  • Chung Wei Lee
  • , Qinghua Liu
  • , Yasin Abbasi-Yadkori
  • , Chi Jin
  • , Tor Lattimore
  • , Csaba Szepesvári

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We consider a contextual bandit problem with S contexts and K actions. In each round t = 1, 2, ... the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into r ≤ min{S, K} groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an ε-optimal policy after using at most Oe(r(S + K)/ε2) samples with high probability and provide a matching Ω(r(S + K)/ε2) lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time T is bounded by (Equation presented). To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and Oe(ppoly(r)(S + K)T) minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713899921
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period12/10/2312/16/23

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Context-lumpable stochastic bandits'. Together they form a unique fingerprint.

Cite this