The knowledge gradient policy for offline learning with independent normal rewards

Peter Frazier, Warren Buckler Powell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies.

Original languageEnglish (US)
Title of host publicationProceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Pages143-150
Number of pages8
DOIs
StatePublished - 2007
Event2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 - Honolulu, HI, United States
Duration: Apr 1 2007Apr 5 2007

Publication series

NameProceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

Other

Other2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
CountryUnited States
CityHonolulu, HI
Period4/1/074/5/07

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Cite this