Abstract
Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications.State-Of-The-Art algorithms usually leverage powerful function approximators (e.g.neural networks) to alleviate the sample complexity hurdle for better empirical performances.Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking.Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA).This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures.We show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design.In addition, we further improve our guarantee with a tighter instance-dependent characterization.We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.
Original language | English (US) |
---|---|
State | Published - 2023 |
Externally published | Yes |
Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: May 1 2023 → May 5 2023 |
Conference
Conference | 11th International Conference on Learning Representations, ICLR 2023 |
---|---|
Country/Territory | Rwanda |
City | Kigali |
Period | 5/1/23 → 5/5/23 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language