Finite-time lower bounds for the two-armed bandit problem

Sanjeev R. Kulkarni, Gábor Lugosi

Research output: Contribution to journalArticle

15 Scopus citations

Abstract

We obtain minimax lower bounds on the regret for the classical two-armed bandit problem. We provide a finite-sample minimax version of the well-known log n asymptotic lower bound of Lai and Robbins. The finite-time lower bound allows us to derive conditions for the amount of time necessary to make any significant gain over a random guessing strategy. These bounds depend on the class of possible distributions of the rewards associated with the arms. For example, in contrast to the log n asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable configurations of the two arms.

Original languageEnglish (US)
Pages (from-to)711-714
Number of pages4
JournalIEEE Transactions on Automatic Control
Volume45
Issue number4
DOIs
StatePublished - Apr 1 2000

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this