Regret Guarantees for Online Deep Control

Xinyi Chen, Edgar Minasyan, Jason D. Lee, Elad Hazan

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Despite the immense success of deep learning in reinforcement learning and control, few theoretical guarantees for neural networks exist for these problems. Deriving performance guarantees is challenging because control is an online problem with no distributional assumptions and an agnostic learning objective, while the theory of deep learning so far focuses on supervised learning with a fixed known training set. In this work, we begin to resolve these challenges and derive the first regret guarantees in online control over a neural network-based policy class. In particular, we show sublinear episodic regret guarantees against a policy class parameterized by deep neural networks, a much richer class than previously considered linear policy parameterizations. Our results center on a reduction from online learning of neural networks to online convex optimization (OCO), and can use any OCO algorithm as a blackbox. Since online learning guarantees are inherently agnostic, we need to quantify the performance of the best policy in our policy class. To this end, we introduce the interpolation dimension, an expressivity metric, which we use to accompany our regret bounds. The results and findings in online deep learning are of independent interest and may have applications beyond online control.

Original languageEnglish (US)
Pages (from-to)1032-1045
Number of pages14
JournalProceedings of Machine Learning Research
Volume211
StatePublished - 2023
Event5th Annual Conference on Learning for Dynamics and Control, L4DC 2023 - Philadelphia, United States
Duration: Jun 15 2023Jun 16 2023

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Regret Guarantees for Online Deep Control'. Together they form a unique fingerprint.

Cite this