TY - GEN
T1 - The knowledge gradient policy for offline learning with independent normal rewards
AU - Frazier, Peter
AU - Powell, Warren Buckler
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies.
AB - We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies.
UR - http://www.scopus.com/inward/record.url?scp=34548782359&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548782359&partnerID=8YFLogxK
U2 - 10.1109/ADPRL.2007.368181
DO - 10.1109/ADPRL.2007.368181
M3 - Conference contribution
AN - SCOPUS:34548782359
SN - 1424407060
SN - 9781424407064
T3 - Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
SP - 143
EP - 150
BT - Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
T2 - 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Y2 - 1 April 2007 through 5 April 2007
ER -