Abstract
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 660-681 |
| Number of pages | 22 |
| Journal | Annals of Statistics |
| Volume | 44 |
| Issue number | 2 |
| DOIs | |
| State | Published - Apr 2016 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Keywords
- Batches
- Grouped clinical trials, sample size determination, switching cost
- Multi-armed bandit problems
- Multi-phase allocation
- Regret bounds