Reward prediction error as an exploration objective in deep RL

Riley Simmons-Edler, Ben Eisner, Daniel Yang, Anthony Bisulco, Eric Mitchell, Sebastian Seung, Daniel Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

A major challenge in reinforcement learning is exploration, when local dithering methods such as ?-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations correlate well with improved reward, they may not explore more efficiently than ?-greedy approaches in environments where the two are not well-correlated. In this paper, we distinguish between exploration tasks in which seeking novel states aids in finding new reward, and those where it does not, such as goal-conditioned tasks and escaping local reward maxima. We propose a new exploration objective, maximizing the reward prediction error (RPE) of a value function trained to predict extrinsic reward. We then propose a deep reinforcement learning method, QXplore, which exploits the temporal difference error of a Q-function to solve hard exploration tasks in high-dimensional MDPs. We demonstrate the exploration behavior of QXplore on several OpenAI Gym MuJoCo tasks and Atari games and observe that QXplore is comparable to or better than a baseline state-novelty method in all cases, outperforming the baseline on tasks where state novelty is not well-correlated with improved reward.

Original languageEnglish (US)
Title of host publicationProceedings of the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020
EditorsChristian Bessiere
PublisherInternational Joint Conferences on Artificial Intelligence
Pages2816-2823
Number of pages8
ISBN (Electronic)9780999241165
StatePublished - 2020
Externally publishedYes
Event29th International Joint Conference on Artificial Intelligence, IJCAI 2020 - Yokohama, Japan
Duration: Jan 1 2021 → …

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2021-January
ISSN (Print)1045-0823

Conference

Conference29th International Joint Conference on Artificial Intelligence, IJCAI 2020
Country/TerritoryJapan
CityYokohama
Period1/1/21 → …

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Reward prediction error as an exploration objective in deep RL'. Together they form a unique fingerprint.

Cite this