The knowledge gradient algorithm for a general class of online learning problems

Ilya O. Ryzhov, Warren Buckler Powell, Peter I. Frazier

Research output: Contribution to journalArticlepeer-review

93 Scopus citations

Abstract

We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.

Original languageEnglish (US)
Pages (from-to)180-195
Number of pages16
JournalOperations Research
Volume60
Issue number1
DOIs
StatePublished - Jan 2012

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Management Science and Operations Research

Keywords

  • Gittins index
  • Index policy
  • Knowledge gradient
  • Multiarmed bandit
  • Online learning
  • Optimal learning

Fingerprint

Dive into the research topics of 'The knowledge gradient algorithm for a general class of online learning problems'. Together they form a unique fingerprint.

Cite this