Bandit problems in networks: Asymptotically efficient distributed allocation rules

Soummya Kar, H. Vincent Poor, Shuguang Cui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

This paper studies the multi-agent bandit problem in a distributed networked setting. The setting considered assumes only one bandit (the major bandit) has accessible reward information from its samples, whereas the rest (the minor bandits) have unobservable rewards. Under the assumption that the minor bandits are aware of the sampling pattern of the major bandit (but with no direct access to its rewards), a lower bound on the expected average network regret is obtained. The lower bound resembles the logarithmic optimal regret attained in single (classical) bandit problems, but in addition is shown to scale down with the number of agents. A collaborative and adaptive distributed allocation rule DA is proposed and is shown to achieve the lower bound on the expected average regret for a connected inter-bandit communication network. In particular, it is shown that under the DA allocation rule, the minor bandits attain sub-logarithmic expected regrets as opposed to logarithmic in the single agent setting.

Original languageEnglish (US)
Title of host publication2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011
Pages1771-1778
Number of pages8
DOIs
StatePublished - Dec 1 2011
Event2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011 - Orlando, FL, United States
Duration: Dec 12 2011Dec 15 2011

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0191-2216

Other

Other2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011
CountryUnited States
CityOrlando, FL
Period12/12/1112/15/11

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Keywords

  • Asymptotically Efficient
  • Distributed Allocation Rules
  • Networked Bandit Problems
  • Partially Observable Rewards

Fingerprint Dive into the research topics of 'Bandit problems in networks: Asymptotically efficient distributed allocation rules'. Together they form a unique fingerprint.

  • Cite this

    Kar, S., Poor, H. V., & Cui, S. (2011). Bandit problems in networks: Asymptotically efficient distributed allocation rules. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011 (pp. 1771-1778). [6160719] (Proceedings of the IEEE Conference on Decision and Control). https://doi.org/10.1109/CDC.2011.6160719