A Convergent Recursive Least Squares Approximate Policy Iteration Algorithm for Multi-Dimensional Markov Decision Process with Continuous State and Action Spaces

Jun Ma, Warren Buckler Powell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinitehorizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.

Original languageEnglish (US)
Title of host publication2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
Pages66-73
Number of pages8
DOIs
StatePublished - 2009
Event2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Nashville, TN, United States
Duration: Mar 30 2009Apr 2 2009

Publication series

Name2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings

Other

Other2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009
Country/TerritoryUnited States
CityNashville, TN
Period3/30/094/2/09

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'A Convergent Recursive Least Squares Approximate Policy Iteration Algorithm for Multi-Dimensional Markov Decision Process with Continuous State and Action Spaces'. Together they form a unique fingerprint.

Cite this