TY - GEN
T1 - Efficient algorithms for adversarial contextual learning
AU - Syrgkanis, Vasilis
AU - Krishnamurthy, Akshay
AU - Schapire, Robert E.
N1 - Publisher Copyright:
Copyright © 2016 by the author(s).
PY - 2016
Y1 - 2016
N2 - We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action. with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of con-texts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai & Vempala, 2005) and achieve regret 0{T3/4 sj'K log(iV)) in the transductive setting and 0{T2t&Ky/\og(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.
AB - We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action. with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of con-texts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai & Vempala, 2005) and achieve regret 0{T3/4 sj'K log(iV)) in the transductive setting and 0{T2t&Ky/\og(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.
UR - http://www.scopus.com/inward/record.url?scp=84998579392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84998579392&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84998579392
T3 - 33rd International Conference on Machine Learning, ICML 2016
SP - 3184
EP - 3206
BT - 33rd International Conference on Machine Learning, ICML 2016
A2 - Weinberger, Kilian Q.
A2 - Balcan, Maria Florina
PB - International Machine Learning Society (IMLS)
T2 - 33rd International Conference on Machine Learning, ICML 2016
Y2 - 19 June 2016 through 24 June 2016
ER -