TY - GEN
T1 - Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent
AU - Bai, Yu
AU - Jin, Chi
AU - Mei, Song
AU - Song, Ziang
AU - Yu, Tiancheng
N1 - Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the Φ-Hedge algorithm-A generic algorithm capable of learning a large class of equilibria for NFGs. We show that Φ-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the Φ-Hedge algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp Õ(√XAT) EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.
AB - A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the Φ-Hedge algorithm-A generic algorithm capable of learning a large class of equilibria for NFGs. We show that Φ-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the Φ-Hedge algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp Õ(√XAT) EFCE-regret under bandit-feedback in an EFG with X information sets, A actions, and T episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.
UR - http://www.scopus.com/inward/record.url?scp=85163165995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163165995&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85163165995
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
Y2 - 28 November 2022 through 9 December 2022
ER -