Model-Based Reinforcement Learning with Value-Targeted Regression

Zeyu Jia, Lin F. Yang, Csaba Szepesvári, Mengdi Wang

Research output: Contribution to journalConference articlepeer-review

33 Scopus citations


Reinforcement learning (RL) applies to control problems with large state and action spaces, hence it is natural to consider RL with a parametric model. In this paper we focus on finite-horizon episodic RL where the transition model admits the linear parametrization: (Equaiton presented). This parametrization provides a universal function approximation and capture several useful models and applications. We propose an upper confidence model-based RL algorithm with value-targeted model parameter estimation. The algorithm updates the estimate of θ by recursively solving a regression problem using the latest value estimate as the target. We demonstrate the efficiency of our algorithm by proving its expected regret bound Õ(d√H3T), where H, T, d are the horizon, total number of steps and dimension of θ. This regret bound is independent of the total number of states or actions, and is close to a lower bound Ω(HdT).

Original languageEnglish (US)
Pages (from-to)666-686
Number of pages21
JournalProceedings of Machine Learning Research
StatePublished - 2020
Externally publishedYes
Event2nd Annual Conference on Learning for Dynamics and Control, L4DC 2020 - Berkeley, United States
Duration: Jun 10 2020Jun 11 2020

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Model-Based Reinforcement Learning with Value-Targeted Regression'. Together they form a unique fingerprint.

Cite this