On the robustness of a one-period look-ahead policy in multi-armed bandit problems

Ilya O. Ryzhov, Peter I. Frazier, Warren Buckler Powell

Research output: Contribution to journalArticlepeer-review

11 Scopus citations

Abstract

We analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies.

Original languageEnglish (US)
Pages (from-to)1635-1644
Number of pages10
JournalProcedia Computer Science
Volume1
Issue number1
DOIs
StatePublished - 2010

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Keywords

  • Bayesian learning
  • Knowledge gradient
  • Multi-armed bandit
  • Optimal learning

Cite this