Model-based reinforcement learning with value-Targeted regression

Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

91 Scopus citations

Abstract

This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model P belongs to a known family of models P, a special case of which is when models in P take the form of linear mixtures: P = Pd i=1 iPi. We propose a model based RL algorithm that is based on the optimism principle: In each episode, the set of models that are consistent with the data collected is constructed. The criterion of consistency is based on the total squared error that the model incurs on the task of predicting state values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, which, in the special case of linear mixtures, takes the form O (dpH3T), where H, T and d are the horizon, the total number of steps and the dimension of , respectively. In particular, this regret bound is independent of the total number of states or actions, and is close to a lower bound (pHdT). For a general model family P, the regret bound is derived based on the Eluder dimension.

Original languageEnglish (US)
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
PublisherInternational Machine Learning Society (IMLS)
Pages440-451
Number of pages12
ISBN (Electronic)9781713821120
StatePublished - 2020
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: Jul 13 2020Jul 18 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-1

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period7/13/207/18/20

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Model-based reinforcement learning with value-Targeted regression'. Together they form a unique fingerprint.

Cite this