TY - GEN
T1 - Towards minimax online learning with unknown time horizon
AU - Luo, Haipeng
AU - Schapire, Robert E.
PY - 2014
Y1 - 2014
N2 - 2014 We consider online learning when the time horizon is unknown, We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal mini- max algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which "pretends" that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worst-case regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting.
AB - 2014 We consider online learning when the time horizon is unknown, We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal mini- max algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which "pretends" that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worst-case regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting.
UR - http://www.scopus.com/inward/record.url?scp=84919833476&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919833476&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84919833476
T3 - 31st International Conference on Machine Learning, ICML 2014
SP - 378
EP - 397
BT - 31st International Conference on Machine Learning, ICML 2014
PB - International Machine Learning Society (IMLS)
T2 - 31st International Conference on Machine Learning, ICML 2014
Y2 - 21 June 2014 through 26 June 2014
ER -