TY - GEN

T1 - Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games

AU - Carmona, Rene

AU - Hamidouche, Kenza

AU - Lauriere, Mathieu

AU - Tan, Zongjun

N1 - Funding Information:
The work of M. Laurière was supported by NSF grant DMS–1716673 and ARO grant W911NF–17–1–0578. The work of K. Hamidouche was supported by Neutigers.
Publisher Copyright:
© 2020 IEEE.

PY - 2020/12/14

Y1 - 2020/12/14

N2 - In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The game is analyzed and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the first case, the gradients are computed exactly using the model whereas they are estimated using Monte-Carlo simulations in the second case. Numerical experiments show the convergence of the two players' controls as well as the utility function when the two algorithms are used in different scenarios.

AB - In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The game is analyzed and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the first case, the gradients are computed exactly using the model whereas they are estimated using Monte-Carlo simulations in the second case. Numerical experiments show the convergence of the two players' controls as well as the utility function when the two algorithms are used in different scenarios.

UR - http://www.scopus.com/inward/record.url?scp=85099884476&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85099884476&partnerID=8YFLogxK

U2 - 10.1109/CDC42340.2020.9303734

DO - 10.1109/CDC42340.2020.9303734

M3 - Conference contribution

AN - SCOPUS:85099884476

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 1038

EP - 1043

BT - 2020 59th IEEE Conference on Decision and Control, CDC 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 59th IEEE Conference on Decision and Control, CDC 2020

Y2 - 14 December 2020 through 18 December 2020

ER -