TY - GEN
T1 - Distributed reinforcement learning in multi-agent networks
AU - Kar, Soummya
AU - Moura, Jose M.F.
AU - Poor, H. Vincent
PY - 2013
Y1 - 2013
N2 - Distributed reinforcement learning algorithms for collaborative multi-agent Markov decision processes (MDPs) are presented and analyzed. The networked setup consists of a collection of agents (learners) which respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. With the objective of jointly learning the optimal stationary control policy (in the absence of global state transition and local agent cost statistics) that minimizes network-averaged infinite horizon discounted cost, the paper presents distributed variants of Q-learning of the consensus + innovations type in which each agent sequentially refines its learning parameters by locally processing its instantaneous payoff data and the information received from neighboring agents. Under broad conditions on the multi-agent decision model and mean connectivity of the inter-agent communication network, the proposed distributed algorithms are shown to achieve optimal learning asymptotically, i.e., almost surely (a.s.) each network agent is shown to learn the value function and the optimal stationary control policy of the collaborative MDP asymptotically. Further, convergence rate estimates for the proposed class of distributed learning algorithms are obtained.
AB - Distributed reinforcement learning algorithms for collaborative multi-agent Markov decision processes (MDPs) are presented and analyzed. The networked setup consists of a collection of agents (learners) which respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. With the objective of jointly learning the optimal stationary control policy (in the absence of global state transition and local agent cost statistics) that minimizes network-averaged infinite horizon discounted cost, the paper presents distributed variants of Q-learning of the consensus + innovations type in which each agent sequentially refines its learning parameters by locally processing its instantaneous payoff data and the information received from neighboring agents. Under broad conditions on the multi-agent decision model and mean connectivity of the inter-agent communication network, the proposed distributed algorithms are shown to achieve optimal learning asymptotically, i.e., almost surely (a.s.) each network agent is shown to learn the value function and the optimal stationary control policy of the collaborative MDP asymptotically. Further, convergence rate estimates for the proposed class of distributed learning algorithms are obtained.
KW - Multi-agent stochastic control
KW - collaborative network processing
KW - consensus + innovations
KW - distributed Q-learning
KW - distributed stochastic approximation
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=84894160317&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894160317&partnerID=8YFLogxK
U2 - 10.1109/CAMSAP.2013.6714066
DO - 10.1109/CAMSAP.2013.6714066
M3 - Conference contribution
AN - SCOPUS:84894160317
SN - 9781467331463
T3 - 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, CAMSAP 2013
SP - 296
EP - 299
BT - 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, CAMSAP 2013
T2 - 2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, CAMSAP 2013
Y2 - 15 December 2013 through 18 December 2013
ER -