Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Research output: Contribution to journalConference articlepeer-review

26 Scopus citations

Abstract

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e.g., Bellman-completeness) and the data coverage (e.g., all-policy concentrability). Despite the recent efforts on relaxing these assumptions, existing works are only able to relax one of the two factors, leaving the strong assumption on the other factor intact. As an important open problem, can we achieve sample-efficient offline RL with weak assumptions on both factors? In this paper we answer the question in the positive. We analyze a simple algorithm based on the primal-dual formulation of MDPs, where the dual variables (discounted occupancy) are modeled using a density-ratio function against offline data. With proper regularization, the algorithm enjoys polynomial sample complexity, under only realizability and single-policy concentrability. We also provide alternative analyses based on different assumptions to shed light on the nature of primal-dual algorithms for offline RL.

Original languageEnglish (US)
Pages (from-to)2730-2775
Number of pages46
JournalProceedings of Machine Learning Research
Volume178
StatePublished - 2022
Event35th Conference on Learning Theory, COLT 2022 - London, United Kingdom
Duration: Jul 2 2022Jul 5 2022

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Keywords

  • offline RL
  • primal-dual
  • reinforcement learning theory

Fingerprint

Dive into the research topics of 'Offline Reinforcement Learning with Realizability and Single-policy Concentrability'. Together they form a unique fingerprint.

Cite this