Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du

Research output: Contribution to conferencePaperpeer-review

13 Scopus citations


We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play T linear bandits with dimension d concurrently, and these T bandit tasks share a common k(≪ d) dimensional linear representation. For the finite-action setting, we present a new algorithm which achieves Oe(T √kN + dkNT) regret, where N is the number of rounds we play for each bandit. When T is sufficiently large, our algorithm significantly outperforms the naive algorithm (playing T bandits independently) that achieves Oe(T √dN) regret. We also provide an Ω(T √kN + dkNT) regret lower bound, showing that our algorithm is minimax-optimal up to poly-logarithmic factors. Furthermore, we extend our algorithm to the infinite-action setting and obtain a corresponding regret bound which demonstrates the benefit of representation learning in certain regimes. We also present experiments on synthetic and real-world data to illustrate our theoretical findings and demonstrate the effectiveness of our proposed algorithms.

Original languageEnglish (US)
StatePublished - 2021
Externally publishedYes
Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
Duration: May 3 2021May 7 2021


Conference9th International Conference on Learning Representations, ICLR 2021
CityVirtual, Online

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language


Dive into the research topics of 'IMPACT OF REPRESENTATION LEARNING IN LINEAR BANDITS'. Together they form a unique fingerprint.

Cite this