On Using Hamiltonian Monte Carlo Sampling for RL

Udari Madhushani, Biswadip Dey, Naomi Ehrich Leonard, Amit Chakraborty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Q-Learning and other value function based reinforcement learning (RL) algorithms learn optimal policies from datasets of actions, rewards, and state transitions. However, generating independent and identically distributed (IID) data samples poses a significant challenge when the state transition dynamics are stochastic and high-dimensional; this is due to intractability of the associated normalizing integral. We address this challenge with Hamiltonian Monte Carlo (HMC) sampling since it offers a computationally tractable way to generate data for training RL algorithms in stochastic and high-dimensional contexts. We introduce Hamiltonian Q-Learning and use it to demonstrate, theoretically and empirically, that Q values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Hamiltonian Q-Learning also exploits underlying low-rank structure of the Q function using a matrix completion algorithm for reconstructing the Q function from Q value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply Q-Learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.

Original languageEnglish (US)
Title of host publication2022 IEEE 61st Conference on Decision and Control, CDC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6640-6645
Number of pages6
ISBN (Electronic)9781665467612
DOIs
StatePublished - 2022
Externally publishedYes
Event61st IEEE Conference on Decision and Control, CDC 2022 - Cancun, Mexico
Duration: Dec 6 2022Dec 9 2022

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2022-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference61st IEEE Conference on Decision and Control, CDC 2022
Country/TerritoryMexico
CityCancun
Period12/6/2212/9/22

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'On Using Hamiltonian Monte Carlo Sampling for RL'. Together they form a unique fingerprint.

Cite this