Abstract
We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a finite-sample minimax version of the well-known log n asymptotic lower bound of Lai and Robbins. Also, in contrast to the log n asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable configurations of the two arms. That is, we show that for every allocation rule and for every n, there is a configuration such that the regret at time n is at least 1-ε times the regret of random guessing, where ε is any small positive constant.
Original language | English (US) |
---|---|
Pages (from-to) | 2293-2297 |
Number of pages | 5 |
Journal | Proceedings of the IEEE Conference on Decision and Control |
Volume | 3 |
State | Published - 1997 |
Event | Proceedings of the 1997 36th IEEE Conference on Decision and Control. Part 1 (of 5) - San Diego, CA, USA Duration: Dec 10 1997 → Dec 12 1997 |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Modeling and Simulation
- Control and Optimization