Geometric exploration for online control

Orestis Plevrakis, Elad Hazan

Research output: Contribution to journalConference articlepeer-review

6 Scopus citations

Abstract

We study the control of an unknown linear dynamical system under general convex costs. The objective is minimizing regret vs the class of strongly-stable linear policies. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with n3 vT-regret, where n is the dimension of the state plus the dimension of control input. The vThorizon dependence is optimal, and improves upon the previous best known bound of T2/3. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in an over-parameterized policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with poly(n)vT-regret, building on Stochastic Bandit Convex Optimization.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume2020-December
StatePublished - 2020
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: Dec 6 2020Dec 12 2020

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Geometric exploration for online control'. Together they form a unique fingerprint.

Cite this