Agnostic Q-learning with function approximation in deterministic systems: Near-optimal bounds on approximation error and sample complexity

Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang

Research output: Contribution to journalConference articlepeer-review

18 Scopus citations

Abstract

The current paper studies the problem of agnostic Q-learning with function approximation in deterministic systems where the optimal Q-function is approximable by a function in the class F with approximation error d = 0. We propose a novel recursion-based algorithm and show that if d = O (?/vdimE), then one can find the optimal policy using O(dimE) trajectories, where ? is the gap between the optimal Q-value of the best actions and that of the second-best actions and dimE is the Eluder dimension of F. Our result has two implications: 1. In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition d = Te (?/vdimE) is necessary and sufficient for algorithms with polynomial sample complexity. 2. In conjunction with the obvious lower bound in the tabular case, our upper bound suggests that the sample complexity Te (dimE) is tight in the agnostic setting. Therefore, we help address the open problem on agnostic Q-learning proposed in [Wen and Van Roy, 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume2020-December
StatePublished - 2020
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: Dec 6 2020Dec 12 2020

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Agnostic Q-learning with function approximation in deterministic systems: Near-optimal bounds on approximation error and sample complexity'. Together they form a unique fingerprint.

Cite this