2D-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations

Soummya Kar, José M.F. Moura, H. Vincent Poor

Research output: Contribution to journalArticlepeer-review

139 Scopus citations

Abstract

The paper develops 2D-learning, a distributed version of reinforcement 2D-learning, for multi-agent Markov decision processes (MDPs); the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the interagent communication network is weakly connected, we prove that 2D-learning, a consensus + innovations algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.

Original languageEnglish (US)
Article number6415291
Pages (from-to)1848-1862
Number of pages15
JournalIEEE Transactions on Signal Processing
Volume61
Issue number7
DOIs
StatePublished - Apr 1 2013

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering

Keywords

  • Collaborative network processing
  • Consensus + innovations
  • Distributed 2D-learning
  • Mixed time-scale dynamics
  • Multi-agent stochastic control
  • Reinforcement learning

Fingerprint

Dive into the research topics of '2D-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations'. Together they form a unique fingerprint.

Cite this