### Abstract

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013 |

Pages | 93-99 |

Number of pages | 7 |

DOIs | |

State | Published - Dec 1 2013 |

Event | 2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - Singapore, Singapore Duration: Apr 16 2013 → Apr 19 2013 |

### Publication series

Name | IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL |
---|---|

ISSN (Print) | 2325-1824 |

ISSN (Electronic) | 2325-1867 |

### Other

Other | 2013 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 |
---|---|

Country | Singapore |

City | Singapore |

Period | 4/16/13 → 4/19/13 |

### All Science Journal Classification (ASJC) codes

- Computational Theory and Mathematics
- Computer Science Applications
- Software

## Fingerprint Dive into the research topics of 'Bias-corrected Q-learning to control max-operator bias in Q-learning'. Together they form a unique fingerprint.

## Cite this

*Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013*(pp. 93-99). [6614994] (IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL). https://doi.org/10.1109/ADPRL.2013.6614994