TY - GEN
T1 - Usage-based RTL subsetting for hardware accelerators
AU - Tan, Qinhan
AU - Gupta, Aarti
AU - Malik, Sharad
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/10/30
Y1 - 2022/10/30
N2 - Recent years have witnessed increasing use of domain-specific accelerators in computing platforms to provide power-performance efficiency for emerging applications. To increase their applicability within the domain, these accelerators tend to support a large set of functions, e.g. Nvidia s open-source Deep Learning Accelerator, NVDLA, supports five distinct groups of functions [17]. However, an individual use case of an accelerator may utilize only a subset of these functions. The unused functions lead to unnecessary overhead of silicon area, power, and hardware verification/hardware-software co-verification complexity. This motivates our research question: Given an RTL design for an accelerator and a subset of functions of interest, can we automatically extract a subset of the RTL that is sufficient for these functions and sequentially equivalent to the original RTL? We call this the Usage-based RTL Subsetting problem, referred to as the RTL subsetting problem in short. We first formally define this problem and show that it can be formulated as a program synthesis problem, which can be solved by performing expensive hyperproperty checks. To overcome the high cost, we propose multiple levels of sound over-approximations to construct an effective algorithm based on relatively less expensive temporal property checking and taint analysis for information flow checking. We demonstrate the acceptable computation cost and the quality of the results of our algorithm through several case studies of accelerators from different domains. The applicability of our proposed algorithm can be seen in its ability to subset the large NVDLA accelerator (with over 50,000 registers and 1,600,000 gates) for the group of convolution functions, where the subset reduces the total number of registers by 18.6% and the total number of gates by 37.1%.
AB - Recent years have witnessed increasing use of domain-specific accelerators in computing platforms to provide power-performance efficiency for emerging applications. To increase their applicability within the domain, these accelerators tend to support a large set of functions, e.g. Nvidia s open-source Deep Learning Accelerator, NVDLA, supports five distinct groups of functions [17]. However, an individual use case of an accelerator may utilize only a subset of these functions. The unused functions lead to unnecessary overhead of silicon area, power, and hardware verification/hardware-software co-verification complexity. This motivates our research question: Given an RTL design for an accelerator and a subset of functions of interest, can we automatically extract a subset of the RTL that is sufficient for these functions and sequentially equivalent to the original RTL? We call this the Usage-based RTL Subsetting problem, referred to as the RTL subsetting problem in short. We first formally define this problem and show that it can be formulated as a program synthesis problem, which can be solved by performing expensive hyperproperty checks. To overcome the high cost, we propose multiple levels of sound over-approximations to construct an effective algorithm based on relatively less expensive temporal property checking and taint analysis for information flow checking. We demonstrate the acceptable computation cost and the quality of the results of our algorithm through several case studies of accelerators from different domains. The applicability of our proposed algorithm can be seen in its ability to subset the large NVDLA accelerator (with over 50,000 registers and 1,600,000 gates) for the group of convolution functions, where the subset reduces the total number of registers by 18.6% and the total number of gates by 37.1%.
UR - http://www.scopus.com/inward/record.url?scp=85145660000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85145660000&partnerID=8YFLogxK
U2 - 10.1145/3508352.3549391
DO - 10.1145/3508352.3549391
M3 - Conference contribution
AN - SCOPUS:85145660000
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2022
Y2 - 30 October 2022 through 4 November 2022
ER -