Abstract
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O *(√T) regret. The setting is a natural generalization of the non-stochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by previous approaches are overcome by the use of a self-concordant potential function. Our approach presents a novel connection between online learning and interior point methods.
| Original language | English (US) |
|---|---|
| Pages | 263-273 |
| Number of pages | 11 |
| State | Published - 2008 |
| Externally published | Yes |
| Event | 21st Annual Conference on Learning Theory, COLT 2008 - Helsinki, Finland Duration: Jul 9 2008 → Jul 12 2008 |
Other
| Other | 21st Annual Conference on Learning Theory, COLT 2008 |
|---|---|
| Country/Territory | Finland |
| City | Helsinki |
| Period | 7/9/08 → 7/12/08 |
All Science Journal Classification (ASJC) codes
- Education