TY - GEN
T1 - On Using Hamiltonian Monte Carlo Sampling for RL
AU - Madhushani, Udari
AU - Dey, Biswadip
AU - Leonard, Naomi Ehrich
AU - Chakraborty, Amit
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Q-Learning and other value function based reinforcement learning (RL) algorithms learn optimal policies from datasets of actions, rewards, and state transitions. However, generating independent and identically distributed (IID) data samples poses a significant challenge when the state transition dynamics are stochastic and high-dimensional; this is due to intractability of the associated normalizing integral. We address this challenge with Hamiltonian Monte Carlo (HMC) sampling since it offers a computationally tractable way to generate data for training RL algorithms in stochastic and high-dimensional contexts. We introduce Hamiltonian Q-Learning and use it to demonstrate, theoretically and empirically, that Q values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Hamiltonian Q-Learning also exploits underlying low-rank structure of the Q function using a matrix completion algorithm for reconstructing the Q function from Q value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply Q-Learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.
AB - Q-Learning and other value function based reinforcement learning (RL) algorithms learn optimal policies from datasets of actions, rewards, and state transitions. However, generating independent and identically distributed (IID) data samples poses a significant challenge when the state transition dynamics are stochastic and high-dimensional; this is due to intractability of the associated normalizing integral. We address this challenge with Hamiltonian Monte Carlo (HMC) sampling since it offers a computationally tractable way to generate data for training RL algorithms in stochastic and high-dimensional contexts. We introduce Hamiltonian Q-Learning and use it to demonstrate, theoretically and empirically, that Q values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Hamiltonian Q-Learning also exploits underlying low-rank structure of the Q function using a matrix completion algorithm for reconstructing the Q function from Q value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply Q-Learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.
UR - http://www.scopus.com/inward/record.url?scp=85146982604&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146982604&partnerID=8YFLogxK
U2 - 10.1109/CDC51059.2022.9992764
DO - 10.1109/CDC51059.2022.9992764
M3 - Conference contribution
AN - SCOPUS:85146982604
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 6640
EP - 6645
BT - 2022 IEEE 61st Conference on Decision and Control, CDC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 61st IEEE Conference on Decision and Control, CDC 2022
Y2 - 6 December 2022 through 9 December 2022
ER -