We consider the problem of black-box control: the task of controlling an unknown linear time-invariant dynamical system from a single trajectory without a stabilizing controller. Under the assumption that the system is controllable, we give the first efficient algorithm that attains sublinear regret under the setting of online nonstochastic control. This resolves an open problem since the work of Abbasi-Yadkori and Szepesvári (2011) on the stochastic black-box LQR problem, and in a more general setting that allows for adversarial perturbations and adversarially chosen changing convex loss functions. We give finite-time regret bounds for our algorithm on the order of 2poly(d) + Õ(poly(d)T2/3) for general nonstochastic control, and 2poly(d) + Õ(poly(d)√T) for black-box LQR. To complete the picture, we investigate the complexity of the online black-box control problem and give a matching regret lower bound of 2Ω(d), showing that the exponential cost is inevitable. This lower bound holds even in the noiseless setting, and applies to any, randomized or deterministic, black-box control method.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Control and Systems Engineering
- Statistics and Probability