TY - JOUR

T1 - Multi-armed bandits with metric movement costs

AU - Koren, Tomer

AU - Livni, Roi

AU - Mansour, Yishay

N1 - Funding Information:
RL is supported in funds by the Eric and Wendy Schmidt Foundation for strategic innovations. YM is supported in part by a grant from the Israel Science Foundation, a grant from the United States-Israel Binational Science Foundation (BSF), and the Israeli Centers of Research Excellence (I-CORE) program (Center No. 4/11).
Publisher Copyright:
© 2017 Neural information processing systems foundation. All rights reserved.

PY - 2017

Y1 - 2017

N2 - We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure C of the underlying metric which depends on its covering numbers. In finite metric spaces with k actions, we give an efficient algorithm that achieves regret of the form Õ(max{C1/3T2/3, √kT}), and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret ⊙(max{k1/3T2/3, √kT}) where C = ⊙(k), and (ii) the interval metric with regret ⊙(max{T2/3, √kT}) where C = ⊙(1). For infinitemetrics spaces with Lipschitz loss functions, we derive a tight regret bound of ⊙(T d+1/d+2) where d ≥ 1 is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

AB - We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure C of the underlying metric which depends on its covering numbers. In finite metric spaces with k actions, we give an efficient algorithm that achieves regret of the form Õ(max{C1/3T2/3, √kT}), and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret ⊙(max{k1/3T2/3, √kT}) where C = ⊙(k), and (ii) the interval metric with regret ⊙(max{T2/3, √kT}) where C = ⊙(1). For infinitemetrics spaces with Lipschitz loss functions, we derive a tight regret bound of ⊙(T d+1/d+2) where d ≥ 1 is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

UR - http://www.scopus.com/inward/record.url?scp=85047018503&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047018503&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85047018503

SN - 1049-5258

VL - 2017-December

SP - 4120

EP - 4129

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017

Y2 - 4 December 2017 through 9 December 2017

ER -