TY - GEN
T1 - Bandit problems in networks
T2 - 2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011
AU - Kar, Soummya
AU - Poor, H. Vincent
AU - Cui, Shuguang
PY - 2011
Y1 - 2011
N2 - This paper studies the multi-agent bandit problem in a distributed networked setting. The setting considered assumes only one bandit (the major bandit) has accessible reward information from its samples, whereas the rest (the minor bandits) have unobservable rewards. Under the assumption that the minor bandits are aware of the sampling pattern of the major bandit (but with no direct access to its rewards), a lower bound on the expected average network regret is obtained. The lower bound resembles the logarithmic optimal regret attained in single (classical) bandit problems, but in addition is shown to scale down with the number of agents. A collaborative and adaptive distributed allocation rule DA is proposed and is shown to achieve the lower bound on the expected average regret for a connected inter-bandit communication network. In particular, it is shown that under the DA allocation rule, the minor bandits attain sub-logarithmic expected regrets as opposed to logarithmic in the single agent setting.
AB - This paper studies the multi-agent bandit problem in a distributed networked setting. The setting considered assumes only one bandit (the major bandit) has accessible reward information from its samples, whereas the rest (the minor bandits) have unobservable rewards. Under the assumption that the minor bandits are aware of the sampling pattern of the major bandit (but with no direct access to its rewards), a lower bound on the expected average network regret is obtained. The lower bound resembles the logarithmic optimal regret attained in single (classical) bandit problems, but in addition is shown to scale down with the number of agents. A collaborative and adaptive distributed allocation rule DA is proposed and is shown to achieve the lower bound on the expected average regret for a connected inter-bandit communication network. In particular, it is shown that under the DA allocation rule, the minor bandits attain sub-logarithmic expected regrets as opposed to logarithmic in the single agent setting.
KW - Asymptotically Efficient
KW - Distributed Allocation Rules
KW - Networked Bandit Problems
KW - Partially Observable Rewards
UR - http://www.scopus.com/inward/record.url?scp=84860652887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860652887&partnerID=8YFLogxK
U2 - 10.1109/CDC.2011.6160719
DO - 10.1109/CDC.2011.6160719
M3 - Conference contribution
AN - SCOPUS:84860652887
SN - 9781612848006
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 1771
EP - 1778
BT - 2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 December 2011 through 15 December 2011
ER -