The value of information in multi-armed bandits with exponentially distributed rewards

Ilya O. Ryzhov, Warren Buckler Powell

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations

Abstract

We consider a class of multi-armed bandit problems where the reward obtained by pulling an arm is drawn from an exponential distribution whose parameter is unknown. A Bayesian model with independent gamma priors is used to represent our beliefs and uncertainty about the exponential parameters. We derive a precise expression for the marginal value of information in this problem, which allows us to create a new knowledge gradient (KG) policy for making decisions. The policy is practical and easy to implement, making a case for value of information as a general approach to optimal learning problems with many different types of learning models.

Original languageEnglish (US)
Pages (from-to)1363-1372
Number of pages10
JournalProcedia Computer Science
Volume4
DOIs
StatePublished - 2011
Event11th International Conference on Computational Science, ICCS 2011 - Singapore, Singapore
Duration: Jun 1 2011Jun 3 2011

All Science Journal Classification (ASJC) codes

  • General Computer Science

Keywords

  • Exponential rewards
  • Knowledge gradient
  • Multi-armed bandit
  • Optimal learning

Fingerprint

Dive into the research topics of 'The value of information in multi-armed bandits with exponentially distributed rewards'. Together they form a unique fingerprint.

Cite this