Contextual decision processes with low Bellman rank are PAC-learnable

Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

86 Scopus citations

Abstract

This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.

Original languageEnglish (US)
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages2671-2707
Number of pages37
ISBN (Electronic)9781510855144
StatePublished - 2017
Externally publishedYes
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: Aug 6 2017Aug 11 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume4

Other

Other34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period8/6/178/11/17

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Contextual decision processes with low Bellman rank are PAC-learnable'. Together they form a unique fingerprint.

Cite this