Q-Learning and other value function based reinforcement learning (RL) algorithms learn optimal policies from datasets of actions, rewards, and state transitions. However, generating independent and identically distributed (IID) data samples poses a significant challenge when the state transition dynamics are stochastic and high-dimensional; this is due to intractability of the associated normalizing integral. We address this challenge with Hamiltonian Monte Carlo (HMC) sampling since it offers a computationally tractable way to generate data for training RL algorithms in stochastic and high-dimensional contexts. We introduce Hamiltonian Q-Learning and use it to demonstrate, theoretically and empirically, that Q values can be learned from a dataset generated by HMC samples of actions, rewards, and state transitions. Hamiltonian Q-Learning also exploits underlying low-rank structure of the Q function using a matrix completion algorithm for reconstructing the Q function from Q value updates over a much smaller subset of state-action pairs. Thus, by providing an efficient way to apply Q-Learning in stochastic, high-dimensional settings, the proposed approach broadens the scope of RL algorithms for real-world applications.