Maximum Likelihood Tensor Decomposition of Markov Decision Process

Chengzhuo Ni, Mengdi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Model reduction is critical to reinforcement learning in high dimensions. In this paper, we study how to learn a reduced model of an unknown Markov decision process (MDP) from empirical trajectories. We focus on estimating the tensor decomposition of an unknown MDP, where the factor matrices correspond to the state and action features. For this purpose, we develop a tensor-rank-constrained maximum likelihood estimator and prove statistical upper bounds of the Kullback-Leiber divergence error and the ℓ2 error between the estimated model and true model. An information-theoretic lower bound for the estimation error is also provided and it nearly matches the upper bound. This estimator provides a statistically efficient approach for model compression and feature extraction in reinforcement learning.

Original languageEnglish (US)
Title of host publication2019 IEEE International Symposium on Information Theory, ISIT 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3062-3066
Number of pages5
ISBN (Electronic)9781538692912
DOIs
StatePublished - Jul 2019
Event2019 IEEE International Symposium on Information Theory, ISIT 2019 - Paris, France
Duration: Jul 7 2019Jul 12 2019

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2019-July
ISSN (Print)2157-8095

Conference

Conference2019 IEEE International Symposium on Information Theory, ISIT 2019
Country/TerritoryFrance
CityParis
Period7/7/197/12/19

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Maximum Likelihood Tensor Decomposition of Markov Decision Process'. Together they form a unique fingerprint.

Cite this