TY - GEN
T1 - A Convergent Recursive Least Squares Approximate Policy Iteration Algorithm for Multi-Dimensional Markov Decision Process with Continuous State and Action Spaces
AU - Ma, Jun
AU - Powell, Warren Buckler
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinitehorizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.
AB - In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinitehorizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.
UR - http://www.scopus.com/inward/record.url?scp=67650505341&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650505341&partnerID=8YFLogxK
U2 - 10.1109/ADPRL.2009.4927527
DO - 10.1109/ADPRL.2009.4927527
M3 - Conference contribution
AN - SCOPUS:67650505341
SN - 9781424427611
T3 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
SP - 66
EP - 73
BT - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
T2 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009
Y2 - 30 March 2009 through 2 April 2009
ER -