Teamwork Reinforcement Learning With Concave Utilities

Zheng Yu, Junyu Zhang, Zheng Wen, Andrea Tacchetti, Mengdi Wang, Ian Gemp

Research output: Contribution to journalArticlepeer-review

Abstract

Complex reinforcement learning (RL) tasks often require a divide-and-conquer approach, where a large task is divided into pieces and solved by individual agents. In this paper, we study a teamwork RL setting where individual agents make decisions on disjoint subsets (blocks) of the state space and have private interests (reward functions), while the entire team aims to maximize a general long-term team utility function and may be subject to constraints. This team utility, which is not necessarily a cumulative sum of rewards, is modeled as a nonlinear function of the team&#x0027;s joint state-action occupancy distribution. By leveraging the inherent duality of policy optimization, we propose a min-max multi-block policy optimization framework to decompose the overall problem into individual local tasks. This enables a federated teamwork mechanism where a team lead coordinates individual agents via reward shaping, and each agent solves its local task defined only on its local state subset. We analyze the convergence of this teamwork policy optimization mechanism and establish an <inline-formula><tex-math notation="LaTeX">$O(1/T)$</tex-math></inline-formula> convergence rate to the team&#x0027;s joint optimum. This mechanism allows team members to jointly find the global socially optimal policy while keeping their local privacy.

Original languageEnglish (US)
Pages (from-to)1-12
Number of pages12
JournalIEEE Transactions on Mobile Computing
DOIs
StateAccepted/In press - 2023

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Keywords

  • Convergence
  • Games
  • Lead
  • Multi-Agent planning and learning
  • multi-block min-max optimization
  • non-markovian reward
  • Optimization
  • reinforcement learning
  • Reinforcement learning
  • Task analysis
  • Teamwork

Fingerprint

Dive into the research topics of 'Teamwork Reinforcement Learning With Concave Utilities'. Together they form a unique fingerprint.

Cite this