TY - JOUR
T1 - Heterogeneous Explore-Exploit Strategies on Multi-Star Networks
AU - Madhushani, Udari
AU - Leonard, Naomi Ehrich
N1 - Funding Information:
Manuscript received September 1, 2020; revised November 2, 2020; accepted November 17, 2020. Date of publication December 3, 2020; date of current version December 23, 2020. This research was supported in part by ONR Grant N00014-18-1-2873 and Grant N00014-19-1-2556, and in part by the ARO under Grant W911NF-18-1-0325. Recommended by Senior Editor M. Arcak. (Corresponding author: Udari Madhushani.) The authors are with the Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: udarim@princeton.edu; naomi@princeton.edu). Digital Object Identifier 10.1109/LCSYS.2020.3042459
Publisher Copyright:
© 2017 IEEE.
PY - 2021/11
Y1 - 2021/11
N2 - We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e., when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds.
AB - We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e., when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds.
KW - Bandit algorithms
KW - distributed learning
KW - heterogeneous strategies
UR - http://www.scopus.com/inward/record.url?scp=85097953617&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097953617&partnerID=8YFLogxK
U2 - 10.1109/LCSYS.2020.3042459
DO - 10.1109/LCSYS.2020.3042459
M3 - Article
AN - SCOPUS:85097953617
SN - 2475-1456
VL - 5
SP - 1603
EP - 1608
JO - IEEE Control Systems Letters
JF - IEEE Control Systems Letters
IS - 5
M1 - 9279222
ER -