A sensing policy based on confidence bounds and a restless multi-armed bandit model

Jan Oksanen, Visa Koivunen, H. Vincent Poor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts of the spectrum to sense and exploit. It is shown that the proposed policy attains asymptotically logarithmic weak regret rate when the rewards are bounded independent and identically distributed or finite state Markovian. Simulation results verifying uniformly logarithmic weak regret are also presented. The proposed policy is a centrally coordinated index policy, in which the index of a frequency band is comprised of a sample mean term and a confidence term. The sample mean term promotes spectrum exploitation whereas the confidence term encourages exploration. The confidence term is designed such that the time interval between consecutive sensing instances of any suboptimal band grows exponentially. This exponential growth between suboptimal sensing time instances leads to logarithmically growing weak regret. Simulation results demonstrate that the proposed policy performs better than other similar methods in the literature.

Original languageEnglish (US)
Title of host publicationConference Record of the 46th Asilomar Conference on Signals, Systems and Computers, ASILOMAR 2012
Pages318-323
Number of pages6
DOIs
StatePublished - Dec 1 2012
Event46th Asilomar Conference on Signals, Systems and Computers, ASILOMAR 2012 - Pacific Grove, CA, United States
Duration: Nov 4 2012Nov 7 2012

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
ISSN (Print)1058-6393

Other

Other46th Asilomar Conference on Signals, Systems and Computers, ASILOMAR 2012
CountryUnited States
CityPacific Grove, CA
Period11/4/1211/7/12

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Networks and Communications

Keywords

  • Cognitive radio
  • Restless Multi-Armed Bandit
  • Spectrum Sensing Policy

Fingerprint Dive into the research topics of 'A sensing policy based on confidence bounds and a restless multi-armed bandit model'. Together they form a unique fingerprint.

Cite this