TY - GEN
T1 - Decentralized reinforcement learning
T2 - 37th International Conference on Machine Learning, ICML 2020
AU - Chang, Michael
AU - Kaushik, Sidhant
AU - Weinberg, S. Matthew
AU - Griffiths, Thomas L.
AU - Levine, Sergey
N1 - Publisher Copyright:
© 2020 37th International Conference on Machine Learning, ICML 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - This paper1 seeks to establish a framework for directing a society of simple, specialized, selfinterested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of noncooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
AB - This paper1 seeks to establish a framework for directing a society of simple, specialized, selfinterested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of noncooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
UR - http://www.scopus.com/inward/record.url?scp=85105191520&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105191520&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105191520
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 1414
EP - 1424
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
Y2 - 13 July 2020 through 18 July 2020
ER -