TY - GEN
T1 - Maximum Likelihood Tensor Decomposition of Markov Decision Process
AU - Ni, Chengzhuo
AU - Wang, Mengdi
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Model reduction is critical to reinforcement learning in high dimensions. In this paper, we study how to learn a reduced model of an unknown Markov decision process (MDP) from empirical trajectories. We focus on estimating the tensor decomposition of an unknown MDP, where the factor matrices correspond to the state and action features. For this purpose, we develop a tensor-rank-constrained maximum likelihood estimator and prove statistical upper bounds of the Kullback-Leiber divergence error and the ℓ2 error between the estimated model and true model. An information-theoretic lower bound for the estimation error is also provided and it nearly matches the upper bound. This estimator provides a statistically efficient approach for model compression and feature extraction in reinforcement learning.
AB - Model reduction is critical to reinforcement learning in high dimensions. In this paper, we study how to learn a reduced model of an unknown Markov decision process (MDP) from empirical trajectories. We focus on estimating the tensor decomposition of an unknown MDP, where the factor matrices correspond to the state and action features. For this purpose, we develop a tensor-rank-constrained maximum likelihood estimator and prove statistical upper bounds of the Kullback-Leiber divergence error and the ℓ2 error between the estimated model and true model. An information-theoretic lower bound for the estimation error is also provided and it nearly matches the upper bound. This estimator provides a statistically efficient approach for model compression and feature extraction in reinforcement learning.
UR - http://www.scopus.com/inward/record.url?scp=85073151577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073151577&partnerID=8YFLogxK
U2 - 10.1109/ISIT.2019.8849765
DO - 10.1109/ISIT.2019.8849765
M3 - Conference contribution
AN - SCOPUS:85073151577
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 3062
EP - 3066
BT - 2019 IEEE International Symposium on Information Theory, ISIT 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Symposium on Information Theory, ISIT 2019
Y2 - 7 July 2019 through 12 July 2019
ER -