TY - GEN
T1 - Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
AU - Hao, Botao
AU - Ji, Xiang
AU - Duan, Yaqi
AU - Lu, Hao
AU - Szepesvári, Csaba
AU - Wang, Mengdi
N1 - Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we study the use of bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE) that is known to be minimax-optimal in the tabular and linear-model cases. We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is asymptotically efficient and distributionally consistent for off-policy statistical inference. To overcome the computation limit of bootstrapping, we further adapt a subsampling procedure that improves the runtime by an order of magnitude. We numerically evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.
AB - Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we study the use of bootstrapping in off-policy evaluation (OPE), and in particular, we focus on the fitted Q-evaluation (FQE) that is known to be minimax-optimal in the tabular and linear-model cases. We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is asymptotically efficient and distributionally consistent for off-policy statistical inference. To overcome the computation limit of bootstrapping, we further adapt a subsampling procedure that improves the runtime by an order of magnitude. We numerically evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.
UR - http://www.scopus.com/inward/record.url?scp=85131930180&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131930180&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85131930180
T3 - Proceedings of Machine Learning Research
SP - 4074
EP - 4084
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
T2 - 38th International Conference on Machine Learning, ICML 2021
Y2 - 18 July 2021 through 24 July 2021
ER -