Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Yulai Zhao, Yuandong Tian, Jason D. Lee, Simon S. Du

Research output: Contribution to journalConference articlepeer-review

11 Scopus citations

Abstract

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

Original languageEnglish (US)
Pages (from-to)2736-2761
Number of pages26
JournalProceedings of Machine Learning Research
Volume151
StatePublished - 2022
Event25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Spain
Duration: Mar 28 2022Mar 30 2022

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games'. Together they form a unique fingerprint.

Cite this