Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Specifically, consider a γ-discounted, infinite-horizon Markov game with S states, in which the max-player has A actions and the min-player has B actions. We propose a pessimistic model–based algorithm with Bernstein-style lower confidence bounds—called the value iteration with lower confidence bounds for zero-sum Markov games—that provably finds an ε-approximate Nash equilibrium with a sample complexity no larger than C?clipped(1-γS)(3Aε+2B) (up to some log factor). Here, C?clipped is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-à-vis the target data), and the target accuracy ε can be any value within (0, 1-1γi . Our sample complexity bound strengthens prior art by a factor of min{A, B}, achieving minimax optimality for a broad regime of interest. An appealing feature of our result lies in its algorithmic simplicity, which reveals the unnecessity of variance reduction and sample splitting in achieving sample optimality.

Original languageEnglish (US)
Pages (from-to)2430-2445
Number of pages16
JournalOperations Research
Volume72
Issue number6
DOIs
StatePublished - Nov 2024
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Management Science and Operations Research

Keywords

  • Nash equilibrium
  • curse of multiple agents
  • minimax optimality
  • model-based approach
  • offline RL
  • unilateral coverage
  • zero-sum Markov games

Fingerprint

Dive into the research topics of 'Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games'. Together they form a unique fingerprint.

Cite this