Multi-Agent Reinforcement Learning with General Utilities via Decentralized Shadow Reward Actor-Critic

Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team’s long-term state-action occupancy measure, i.e., a general utility. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. We derive the Decentralized Shadow Reward Actor-Critic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the “shadow reward”. DSAC converges to a stationary point in sublinear rate with high probability, depending on the amount of communications. Under proper conditions, we further establish the non-existence of spurious stationary points for this problem, that is, DSAC finds the globally optimal policy. Experiments demonstrate the merits of goals beyond the cumulative return in cooperative MARL.

Original languageEnglish (US)
Title of host publicationAAAI-22 Technical Tracks 8
PublisherAssociation for the Advancement of Artificial Intelligence
Pages9031-9039
Number of pages9
ISBN (Electronic)1577358767, 9781577358763
DOIs
StatePublished - Jun 30 2022
Event36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online
Duration: Feb 22 2022Mar 1 2022

Publication series

NameProceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Volume36

Conference

Conference36th AAAI Conference on Artificial Intelligence, AAAI 2022
CityVirtual, Online
Period2/22/223/1/22

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Multi-Agent Reinforcement Learning with General Utilities via Decentralized Shadow Reward Actor-Critic'. Together they form a unique fingerprint.

Cite this