TY - GEN
T1 - Distributed Bandits
T2 - 2021 European Control Conference, ECC 2021
AU - Madhushani, Udari
AU - Leonard, Naomi Ehrich
N1 - Funding Information:
This research has been supported in part by ONR grants N00014-18-1-2873 and N00014-19-1-2556 and ARO grant W911NF-18-1-0325. U. Madhushani and N.E. Leonard are with Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ 08544, USA.
Publisher Copyright:
© 2021 EUCA.
PY - 2021
Y1 - 2021
N2 - We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a d-regular graph. Every edge in the graph has probabilistic weight p to account for the (1 - p) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability p. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.
AB - We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a d-regular graph. Every edge in the graph has probabilistic weight p to account for the (1 - p) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability p. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.
UR - http://www.scopus.com/inward/record.url?scp=85120500411&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120500411&partnerID=8YFLogxK
U2 - 10.23919/ECC54610.2021.9655031
DO - 10.23919/ECC54610.2021.9655031
M3 - Conference contribution
AN - SCOPUS:85120500411
T3 - 2021 European Control Conference, ECC 2021
SP - 830
EP - 835
BT - 2021 European Control Conference, ECC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 29 June 2021 through 2 July 2021
ER -