Abstract
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O *(√T) regret. The setting is a natural generalization of the non-stochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a self-concordant potential function. Our approach presents a novel connection between online learning and interior point methods.
Original language | English (US) |
---|---|
Pages | 263-273 |
Number of pages | 11 |
State | Published - 2008 |
Externally published | Yes |
Event | 21st Annual Conference on Learning Theory, COLT 2008 - Helsinki, Finland Duration: Jul 9 2008 → Jul 12 2008 |
Other
Other | 21st Annual Conference on Learning Theory, COLT 2008 |
---|---|
Country/Territory | Finland |
City | Helsinki |
Period | 7/9/08 → 7/12/08 |
All Science Journal Classification (ASJC) codes
- Education