TY - GEN
T1 - Bias-corrected Q-learning to control max-operator bias in Q-learning
AU - Lee, Donghun
AU - Defourny, Boris
AU - Powell, Warren Buckler
PY - 2013
Y1 - 2013
N2 - We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.
AB - We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.
UR - http://www.scopus.com/inward/record.url?scp=84891539165&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84891539165&partnerID=8YFLogxK
U2 - 10.1109/ADPRL.2013.6614994
DO - 10.1109/ADPRL.2013.6614994
M3 - Conference contribution
AN - SCOPUS:84891539165
SN - 9781467359252
T3 - IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL
SP - 93
EP - 99
BT - Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013
T2 - 2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013
Y2 - 16 April 2013 through 19 April 2013
ER -