An online primal-dual method for discounted Markov decision processes

Mengdi Wang, Yichen Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

We consider the online solution of discounted Markov decision processes (MDP). We focus on the black-box learning model where transition probabilities and state transition cost are unknown. Instead, a simulator is available to generate random state transitions under given actions. We propose a stochastic primal-dual algorithm for solving the linear formulation of the Bellman equation. The algorithm updates the primal and dual iterates by using sample state transitions and sample costs generated by the simulator. We provide a thresholding procedure that recovers the exact optimal policy from the dual iterates with high probability.

Original languageEnglish (US)
Title of host publication2016 IEEE 55th Conference on Decision and Control, CDC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4516-4521
Number of pages6
ISBN (Electronic)9781509018376
DOIs
StatePublished - Dec 27 2016
Event55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States
Duration: Dec 12 2016Dec 14 2016

Publication series

Name2016 IEEE 55th Conference on Decision and Control, CDC 2016

Other

Other55th IEEE Conference on Decision and Control, CDC 2016
CountryUnited States
CityLas Vegas
Period12/12/1612/14/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Decision Sciences (miscellaneous)
  • Control and Optimization

Fingerprint Dive into the research topics of 'An online primal-dual method for discounted Markov decision processes'. Together they form a unique fingerprint.

Cite this