Teamwork Reinforcement Learning With Concave Utilities

Zheng Yu, Junyu Zhang, Zheng Wen, Andrea Tacchetti, Mengdi Wang, Ian Gemp

Research output: Contribution to journalArticlepeer-review

Abstract

Complex reinforcement learning (RL) tasks often require a divide-and-conquer approach, where a large task is divided into pieces and solved by individual agents. In this paper, we study a teamwork RL setting where individual agents make decisions on disjoint subsets (blocks) of the state space and have private interests (reward functions), while the entire team aims to maximize a general long-term team utility function and may be subject to constraints. This team utility, which is not necessarily a cumulative sum of rewards, is modeled as a nonlinear function of the team's joint state-action occupancy distribution. By leveraging the inherent duality of policy optimization, we propose a min-max multi-block policy optimization framework to decompose the overall problem into individual local tasks. This enables a federated teamwork mechanism where a team lead coordinates individual agents via reward shaping, and each agent solves its local task defined only on its local state subset. We analyze the convergence of this teamwork policy optimization mechanism and establish an O(1/T)O(1/T) convergence rate to the team's joint optimum. This mechanism allows team members to jointly find the global socially optimal policy while keeping their local privacy.

Original languageEnglish (US)
Pages (from-to)5709-5721
Number of pages13
JournalIEEE Transactions on Mobile Computing
Volume23
Issue number5
DOIs
StatePublished - May 1 2024

All Science Journal Classification (ASJC) codes

  • Software
  • Electrical and Electronic Engineering
  • Computer Networks and Communications

Keywords

  • Reinforcement learning
  • multi-agent planning and learning
  • multi-block min-max optimization
  • non-Markovian reward

Fingerprint

Dive into the research topics of 'Teamwork Reinforcement Learning With Concave Utilities'. Together they form a unique fingerprint.

Cite this