Scalable bilinear π learning using state and action features

Yichen Chen, Lihong Li, Mengdi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

AApproximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear π learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts (bi)linear models to represent the high-dimensional value function and state-action distributions, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages1305-1319
Number of pages15
ISBN (Electronic)9781510867963
StatePublished - 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume2

Other

Other35th International Conference on Machine Learning, ICML 2018
Country/TerritorySweden
CityStockholm
Period7/10/187/15/18

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Scalable bilinear π learning using state and action features'. Together they form a unique fingerprint.

Cite this