Better rates for any adversarial deterministic MDP

Ofer Dekel, Elad Hazan

Research output: Contribution to conferencePaperpeer-review

14 Scopus citations

Abstract

We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.

Original languageEnglish (US)
Pages1712-1720
Number of pages9
StatePublished - 2013
Externally publishedYes
Event30th International Conference on Machine Learning, ICML 2013 - Atlanta, GA, United States
Duration: Jun 16 2013Jun 21 2013

Other

Other30th International Conference on Machine Learning, ICML 2013
Country/TerritoryUnited States
CityAtlanta, GA
Period6/16/136/21/13

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Sociology and Political Science

Fingerprint

Dive into the research topics of 'Better rates for any adversarial deterministic MDP'. Together they form a unique fingerprint.

Cite this