TY - GEN
T1 - Near-Optimal Representation Learning for Linear Bandits and Linear RL
AU - Hu, Jiachen
AU - Chen, Xiaoyu
AU - Jin, Chi
AU - Li, Lihong
AU - Wang, Liwei
N1 - Publisher Copyright:
Copyright © 2021 by the author(s)
PY - 2021
Y1 - 2021
N2 - This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k ≪ d and k ≪ M. We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve Õ(M√dkT + d√kMT) regret, with T being the number of total steps. Our regret significantly improves upon the baseline Õ(Md√T) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a). To the best of our knowledge, this is the first theoretical result that characterize the benefits of multi-task representation learning for exploration in RL with function approximation.
AB - This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k ≪ d and k ≪ M. We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve Õ(M√dkT + d√kMT) regret, with T being the number of total steps. Our regret significantly improves upon the baseline Õ(Md√T) achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when d > M. Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a). To the best of our knowledge, this is the first theoretical result that characterize the benefits of multi-task representation learning for exploration in RL with function approximation.
UR - http://www.scopus.com/inward/record.url?scp=85161347780&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161347780&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85161347780
T3 - Proceedings of Machine Learning Research
SP - 4349
EP - 4358
BT - Proceedings of the 38th International Conference on Machine Learning, ICML 2021
PB - ML Research Press
T2 - 38th International Conference on Machine Learning, ICML 2021
Y2 - 18 July 2021 through 24 July 2021
ER -