TY - GEN
T1 - The knowledge gradient algorithm for online subset selection
AU - Ryzhov, Ilya O.
AU - Powell, Warren
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - We derive a one-period look-ahead policy for online subset selection problems, where learning about one subset also gives us information about other subsets. The subset selection problem is treated as a multi-armed bandit problem with correlated prior beliefs. We show that our decision rule is easily computable, and present experimental evidence that the policy is competitive against other online learning policies.
AB - We derive a one-period look-ahead policy for online subset selection problems, where learning about one subset also gives us information about other subsets. The subset selection problem is treated as a multi-armed bandit problem with correlated prior beliefs. We show that our decision rule is easily computable, and present experimental evidence that the policy is competitive against other online learning policies.
UR - http://www.scopus.com/inward/record.url?scp=67650505320&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650505320&partnerID=8YFLogxK
U2 - 10.1109/ADPRL.2009.4927537
DO - 10.1109/ADPRL.2009.4927537
M3 - Conference contribution
AN - SCOPUS:67650505320
SN - 9781424427611
T3 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
SP - 137
EP - 144
BT - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
T2 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009
Y2 - 30 March 2009 through 2 April 2009
ER -