Abstract
We study the problem of predicting individual sequences with linear loss with full and partial (or bandit) feedback. Our main contribution is the first efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal Õ(√T) regret. In addition, for the full-information setting, we give a novel regret minimization algorithm. These results are made possible by the introduction of interior-point methods for convex optimization to online learning.
Original language | English (US) |
---|---|
Article number | 6191328 |
Pages (from-to) | 4164-4175 |
Number of pages | 12 |
Journal | IEEE Transactions on Information Theory |
Volume | 58 |
Issue number | 7 |
DOIs | |
State | Published - 2012 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences
Keywords
- Bandit feedback
- interior-point methods
- online convex optimization
- online learning