Skip to main navigation Skip to search Skip to main content

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

  • Botao Hao
  • , Xiang Ji
  • , Yaqi Duan
  • , Hao Lu
  • , Csaba Szepesvári
  • , Mengdi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we study the use of bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE) that is known to be minimax-optimal in the tabular and linear-model cases. We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is asymptotically efficient and distributionally consistent for off-policy statistical inference. To overcome the computation limit of bootstrapping, we further adapt a subsampling procedure that improves the runtime by an order of magnitude. We numerically evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.

Original languageEnglish (US)
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages4074-4084
Number of pages11
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: Jul 18 2021Jul 24 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period7/18/217/24/21

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Bootstrapping Fitted Q-Evaluation for Off-Policy Inference'. Together they form a unique fingerprint.

Cite this