Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning

Peter Henderson, Wei Di Chang, Pierre Luc Bacon, David Meger, Joelle Pineau, Doina Precup

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Scopus citations

Abstract

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

Original languageEnglish (US)
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI press
Pages3199-3206
Number of pages8
ISBN (Electronic)9781577358008
StatePublished - 2018
Externally publishedYes
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: Feb 2 2018Feb 7 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Conference

Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/TerritoryUnited States
CityNew Orleans
Period2/2/182/7/18

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Optiongan: Learning joint reward-policy options using generative adversarial inverse reinforcement learning'. Together they form a unique fingerprint.

Cite this