TY - GEN
T1 - Accelerating Quadratic Optimization with Reinforcement Learning
AU - Ichnowski, Jeffrey
AU - Jain, Paras
AU - Stellato, Bartolomeo
AU - Banjac, Goran
AU - Luo, Michael
AU - Borrelli, Francesco
AU - Gonzalez, Joseph E.
AU - Stoica, Ion
AU - Goldberg, Ken
N1 - Funding Information:
This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, and the CITRIS ªPeople and Robots" (CPAR) Initiative. In addition to NSF CISE Expeditions Award CCF-1730628, this research is supported by gifts from Amazon Web Services, Ant Group, Ericsson, Facebook, Futurewei, Google, Intel, Microsoft, Nvidia, Scotiabank, Splunk and VMware. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. We thank Ashwin Balakrishna, Arnav Gulati as well as other colleagues for their helpful feedback.
Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-ofthe- art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems. Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/.
AB - First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-ofthe- art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Mészáros problems. Code, models, and videos are available at https://berkeleyautomation.github.io/rlqp/.
UR - http://www.scopus.com/inward/record.url?scp=85132581171&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132581171&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85132581171
T3 - Advances in Neural Information Processing Systems
SP - 21043
EP - 21055
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -