Multi-armed bandits with metric movement costs

Tomer Koren, Roi Livni, Yishay Mansour

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations

Abstract

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure C of the underlying metric which depends on its covering numbers. In finite metric spaces with k actions, we give an efficient algorithm that achieves regret of the form Õ(max{C1/3T2/3, √kT}), and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret ⊙(max{k1/3T2/3, √kT}) where C = ⊙(k), and (ii) the interval metric with regret ⊙(max{T2/3, √kT}) where C = ⊙(1). For infinitemetrics spaces with Lipschitz loss functions, we derive a tight regret bound of ⊙(T d+1/d+2) where d ≥ 1 is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

Original languageEnglish (US)
Pages (from-to)4120-4129
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2017-December
StatePublished - 2017
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: Dec 4 2017Dec 9 2017

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Multi-armed bandits with metric movement costs'. Together they form a unique fingerprint.

Cite this