TY - GEN
T1 - Multi-Issue Butterfly Architecture for Sparse Convex Quadratic Programming
AU - Wang, Maolin
AU - McInerney, Ian
AU - Stellato, Bartolomeo
AU - Tu, Fengbin
AU - Boyd, Stephen
AU - So, Hayden Kwok Hay
AU - Cheng, Kwang Ting
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Convex quadratic optimization solvers are extensively utilized in various domains; however, achieving optimal performance in diverse situations remains a significant challenge due to the sparse nature of objective and constraint matrices. General-purpose architectures struggle with hardware utilization when performing critical sparse matrix operations, such as factorization and multiplication. To address this issue, we introduce a pipelined spatial architecture, Multi-Issue Butterfly (MIB), which supports all primitive scalar, vector, and matrix operations required by the Alternating Direction Method of Multipliers (ADMM) based solver algorithm. The proposed architecture features a butterfly computational network with innovative working modes for each node, controlled by runtime instructions. We developed a companion scheduling method for matrix operations based on their sparsity patterns. For factorization, an elimination tree guides the network instructions reordering to avoid data hazards caused by computation dependencies. For matrix-vector multiplication, data prefetching resolves structural hazards caused by read and write conflicts to register files. Instructions without hazards are issued simultaneously to increase pipeline throughput and function unit utilization. We evaluate the proposed architecture using FPGA prototypes, representing the first fully FPGA-based generic QP solver. Our assessment includes extensive performance and efficiency bench-marks across 100 QP problems from five application domains. Compared to the same algorithm variation running on CPU backends, our prototype achieves a geometric mean of 30.5× end-To-end speedup, 127.0 × greater energy efficiency, and 16.5× less runtime jitter. In comparison to GPU backends, the prototype attains a geometric mean of 4.3× faster end-To-end speedup, 21.7× higher energy efficiency, and 33.4× less runtime jitter.
AB - Convex quadratic optimization solvers are extensively utilized in various domains; however, achieving optimal performance in diverse situations remains a significant challenge due to the sparse nature of objective and constraint matrices. General-purpose architectures struggle with hardware utilization when performing critical sparse matrix operations, such as factorization and multiplication. To address this issue, we introduce a pipelined spatial architecture, Multi-Issue Butterfly (MIB), which supports all primitive scalar, vector, and matrix operations required by the Alternating Direction Method of Multipliers (ADMM) based solver algorithm. The proposed architecture features a butterfly computational network with innovative working modes for each node, controlled by runtime instructions. We developed a companion scheduling method for matrix operations based on their sparsity patterns. For factorization, an elimination tree guides the network instructions reordering to avoid data hazards caused by computation dependencies. For matrix-vector multiplication, data prefetching resolves structural hazards caused by read and write conflicts to register files. Instructions without hazards are issued simultaneously to increase pipeline throughput and function unit utilization. We evaluate the proposed architecture using FPGA prototypes, representing the first fully FPGA-based generic QP solver. Our assessment includes extensive performance and efficiency bench-marks across 100 QP problems from five application domains. Compared to the same algorithm variation running on CPU backends, our prototype achieves a geometric mean of 30.5× end-To-end speedup, 127.0 × greater energy efficiency, and 16.5× less runtime jitter. In comparison to GPU backends, the prototype attains a geometric mean of 4.3× faster end-To-end speedup, 21.7× higher energy efficiency, and 33.4× less runtime jitter.
KW - FPGA
KW - convex optimization
KW - out-of-order instruction issue
KW - quadratic programming
UR - https://www.scopus.com/pages/publications/85213374673
UR - https://www.scopus.com/pages/publications/85213374673#tab=citedBy
U2 - 10.1109/MICRO61859.2024.00115
DO - 10.1109/MICRO61859.2024.00115
M3 - Conference contribution
AN - SCOPUS:85213374673
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 1574
EP - 1587
BT - Proceedings - 2024 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
PB - IEEE Computer Society
T2 - 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
Y2 - 2 November 2024 through 6 November 2024
ER -