Abstract
In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear Payoff functions. For T rounds, K actions, and d dimensional feature vectors, we prove an O(√Td ln 3(KT ln(T)/δ)) regret bound that holds with probability 1-δ for the simplest known (both conceptually and computationally) efficient upper confidence bound algorithm for this problem. We also prove a lower bound of Ω( √Td) for this setting, matching the upper bound up to logarithmic factors.
Original language | English (US) |
---|---|
Pages (from-to) | 208-214 |
Number of pages | 7 |
Journal | Journal of Machine Learning Research |
Volume | 15 |
State | Published - 2011 |
Event | 14th International Conference on Artificial Intelligence and Statistics, AISTATS 2011 - Fort Lauderdale, FL, United States Duration: Apr 11 2011 → Apr 13 2011 |
All Science Journal Classification (ASJC) codes
- Software
- Artificial Intelligence
- Control and Systems Engineering
- Statistics and Probability