Bias-corrected Q-learning to control max-operator bias in Q-learning

Donghun Lee, Boris Defourny, Warren Buckler Powell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

Original languageEnglish (US)
Title of host publicationProceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013
Pages93-99
Number of pages7
DOIs
StatePublished - Dec 1 2013
Event2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - Singapore, Singapore
Duration: Apr 16 2013Apr 19 2013

Publication series

NameIEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL
ISSN (Print)2325-1824
ISSN (Electronic)2325-1867

Other

Other2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013
CountrySingapore
CitySingapore
Period4/16/134/19/13

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Bias-corrected Q-learning to control max-operator bias in Q-learning'. Together they form a unique fingerprint.

  • Cite this

    Lee, D., Defourny, B., & Powell, W. B. (2013). Bias-corrected Q-learning to control max-operator bias in Q-learning. In Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013 (pp. 93-99). [6614994] (IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL). https://doi.org/10.1109/ADPRL.2013.6614994