Abstract
We consider a class of multi-armed bandit problems where the reward obtained by pulling an arm is drawn from an exponential distribution whose parameter is unknown. A Bayesian model with independent gamma priors is used to represent our beliefs and uncertainty about the exponential parameters. We derive a precise expression for the marginal value of information in this problem, which allows us to create a new knowledge gradient (KG) policy for making decisions. The policy is practical and easy to implement, making a case for value of information as a general approach to optimal learning problems with many different types of learning models.
Original language | English (US) |
---|---|
Pages (from-to) | 1363-1372 |
Number of pages | 10 |
Journal | Procedia Computer Science |
Volume | 4 |
DOIs | |
State | Published - 2011 |
Event | 11th International Conference on Computational Science, ICCS 2011 - Singapore, Singapore Duration: Jun 1 2011 → Jun 3 2011 |
All Science Journal Classification (ASJC) codes
- General Computer Science
Keywords
- Exponential rewards
- Knowledge gradient
- Multi-armed bandit
- Optimal learning