TY - GEN
T1 - Contextual decision processes with low Bellman rank are PAC-learnable
AU - Jiang, Nan
AU - Krishnamurthy, Akshay
AU - Agarwal, Alekh
AU - Langford, John
AU - Schapire, Robert E.
N1 - Publisher Copyright:
© Copyright 2017 by the author(s).
PY - 2017
Y1 - 2017
N2 - This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.
AB - This paper studies systematic exploration for reinforcement learning (RL) with rich observations and function approximation. We introduce contextual decision processes (CDPs), that unify most prior RL settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in CDPs and is naturally small for many well-studied RL models. Our second contribution is a new RL algorithm that does systematic exploration to learn near-optimal behavior in CDPs with low Bellman rank. The algorithm requires a number of samples that is polynomial in all relevant parameters but independent of the number of unique contexts. Our approach uses Bellman error minimization with optimistic exploration and provides new insights into efficient exploration for RL with function approximation.
UR - http://www.scopus.com/inward/record.url?scp=85047017616&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047017616&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85047017616
T3 - 34th International Conference on Machine Learning, ICML 2017
SP - 2671
EP - 2707
BT - 34th International Conference on Machine Learning, ICML 2017
PB - International Machine Learning Society (IMLS)
T2 - 34th International Conference on Machine Learning, ICML 2017
Y2 - 6 August 2017 through 11 August 2017
ER -