TY - JOUR
T1 - Non-stochastic control with bandit feedback
AU - Gradu, Paula
AU - Hallman, John
AU - Hazan, Elad
N1 - Funding Information:
EH acknowledges support of NSF grant # 1704860. All work was done while PG, JH and EH were employed at Google.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unknown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.
AB - We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unknown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.
UR - http://www.scopus.com/inward/record.url?scp=85102128392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102128392&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85102128392
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -