We study the problem of predicting individual sequences with linear loss with full and partial (or bandit) feedback. Our main contribution is the first efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal Õ(√T) regret. In addition, for the full-information setting, we give a novel regret minimization algorithm. These results are made possible by the introduction of interior-point methods for convex optimization to online learning.
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences