OFFLINE REINFORCEMENT LEARNING WITH DIFFERENTIABLE FUNCTION APPROXIMATION IS PROVABLY EFFICIENT

Ming Yin, Mengdi Wang, Yu Xiang Wang

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations

Abstract

Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications.State-Of-The-Art algorithms usually leverage powerful function approximators (e.g.neural networks) to alleviate the sample complexity hurdle for better empirical performances.Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking.Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA).This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures.We show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design.In addition, we further improve our guarantee with a tighter instance-dependent characterization.We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.

Original languageEnglish (US)
StatePublished - 2023
Externally publishedYes
Event11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda
Duration: May 1 2023May 5 2023

Conference

Conference11th International Conference on Learning Representations, ICLR 2023
Country/TerritoryRwanda
CityKigali
Period5/1/235/5/23

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'OFFLINE REINFORCEMENT LEARNING WITH DIFFERENTIABLE FUNCTION APPROXIMATION IS PROVABLY EFFICIENT'. Together they form a unique fingerprint.

Cite this