Abstract
Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2736-2761 |
| Number of pages | 26 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 151 |
| State | Published - 2022 |
| Event | 25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Spain Duration: Mar 28 2022 → Mar 30 2022 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability
Fingerprint
Dive into the research topics of 'Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver