Improper Learning for Non-Stochastic Control

Max Simchowitz, Karan Singh, Elad Hazan

Research output: Contribution to journalConference articlepeer-review

64 Scopus citations

Abstract

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. In the fully-adversarial setting, our controller attains an optimal regret bound of √T-when the system is known, and, when combined with an initial stage of least-squares estimation, T2/3 when the system is unknown; both yield the first sublinear regret for the partially observed setting. Our bounds are the first in the non-stochastic control setting that compete with all stabilizing linear dynamical controllers, not just state feedback. Moreover, in the presence of semi-adversarial noise containing both stochastic and adversarial components, our controller attains the optimal regret bounds of poly(log T) when the system is known, and √T when unknown. To our knowledge, this gives the first end-to-end √T regret for online Linear Quadratic Gaussian controller, and applies in a more general setting with adversarial losses and semi-adversarial noise.

Original languageEnglish (US)
Pages (from-to)3320-3436
Number of pages117
JournalProceedings of Machine Learning Research
Volume125
StatePublished - 2020
Event33rd Conference on Learning Theory, COLT 2020 - Virtual, Online, Austria
Duration: Jul 9 2020Jul 12 2020

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Improper Learning for Non-Stochastic Control'. Together they form a unique fingerprint.

Cite this