Intermittent Communications in Decentralized Shadow Reward Actor-Critic

Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Junyu Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Broader decision-making goals such as risk-sensitivity, exploration, and incorporating prior experience motivates the study of cooperative multi-agent reinforcement learning (MARL) problems where the objective is any nonlinear function of the team's long-term state-action occupancy measure, i.e., a general utility, which subsumes the aforementioned goals. Existing decentralized actor-critic algorithms to solve this problem require extensive message passing per policy update, which may be impractical. Thus, we put forth Communication-Efficient Decentralized Shadow Reward Actor-Critic (CE-DSAC) that may operate with time-varying or event-triggered network connectivities. This scheme operates by having agents to alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). CE-DSAC is different from the usual critic update in its local occupancy measure estimation step which is needed to estimate the derivative of the local utility with respect to their occupancy measure, i.e., the "shadow reward,"and the amount of local weighted averaging steps executed by agents. This scheme improves existing tradeoffs between communications and convergence: to obtain ϵ-stationarity, we require in {mathcal{O}}left({1/{ in {2.5}}}right) (Theorem IV.6) or faster {mathcal{O}}left({1/{ in 2}}right) (Corollary IV.8) steps with high probability. Experiments demonstrate the merits of this approach for multiple RL agents solving cooperative navigation tasks with intermittent communications.

Original languageEnglish (US)
Title of host publication60th IEEE Conference on Decision and Control, CDC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2613-2620
Number of pages8
ISBN (Electronic)9781665436595
DOIs
StatePublished - 2021
Externally publishedYes
Event60th IEEE Conference on Decision and Control, CDC 2021 - Austin, United States
Duration: Dec 13 2021Dec 17 2021

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2021-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference60th IEEE Conference on Decision and Control, CDC 2021
Country/TerritoryUnited States
CityAustin
Period12/13/2112/17/21

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Intermittent Communications in Decentralized Shadow Reward Actor-Critic'. Together they form a unique fingerprint.

Cite this