Abstract
The paper develops 2D-learning, a distributed version of reinforcement 2D-learning, for multi-agent Markov decision processes (MDPs); the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the interagent communication network is weakly connected, we prove that 2D-learning, a consensus + innovations algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.
Original language | English (US) |
---|---|
Article number | 6415291 |
Pages (from-to) | 1848-1862 |
Number of pages | 15 |
Journal | IEEE Transactions on Signal Processing |
Volume | 61 |
Issue number | 7 |
DOIs | |
State | Published - Apr 1 2013 |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering
Keywords
- Collaborative network processing
- Consensus + innovations
- Distributed 2D-learning
- Mixed time-scale dynamics
- Multi-agent stochastic control
- Reinforcement learning