Contextual bandits with linear Payoff functions

Wei Chu, Lihong Li, Lev Reyzin, Robert E. Schapire

Research output: Contribution to journalConference articlepeer-review

602 Scopus citations

Abstract

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear Payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(√Td ln 3(KT ln(T)/δ)) regret bound that holds with probability 1-δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω( √Td) for this setting, matching the upper bound up to logarithmic factors.

Original languageEnglish (US)
Pages (from-to)208-214
Number of pages7
JournalJournal of Machine Learning Research
Volume15
StatePublished - 2011
Event14th International Conference on Artificial Intelligence and Statistics, AISTATS 2011 - Fort Lauderdale, FL, United States
Duration: Apr 11 2011Apr 13 2011

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Contextual bandits with linear Payoff functions'. Together they form a unique fingerprint.

Cite this