TY - GEN
T1 - On distributed cooperative decision-making in multiarmed bandits
AU - Landgren, Peter
AU - Srivastava, Vaibhav
AU - Leonard, Naomi Ehrich
N1 - Publisher Copyright:
© 2016 EUCA.
PY - 2016
Y1 - 2016
N2 - We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.
AB - We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection of arms. We rigorously analyze the performance of the cooperative UCB algorithm and characterize the influence of communication graph structure on the decision-making performance of the group.
UR - http://www.scopus.com/inward/record.url?scp=85015026078&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015026078&partnerID=8YFLogxK
U2 - 10.1109/ECC.2016.7810293
DO - 10.1109/ECC.2016.7810293
M3 - Conference contribution
AN - SCOPUS:85015026078
T3 - 2016 European Control Conference, ECC 2016
SP - 243
EP - 248
BT - 2016 European Control Conference, ECC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 European Control Conference, ECC 2016
Y2 - 29 June 2016 through 1 July 2016
ER -