TY - JOUR
T1 - MemFlow
T2 - Memory-Driven Data Scheduling with Datapath Co-Design in Accelerators for Large-Scale Inference Applications
AU - Nie, Qi
AU - Malik, Sharad
N1 - Funding Information:
Manuscript received October 21, 2018; revised January 12, 2019 and March 31, 2019; accepted May 18, 2019. Date of publication June 27, 2019; date of current version August 20, 2020. This work was supported by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA. This paper was recommended by Associate Editor Y. Wang. (Corresponding author: Qi Nie.) The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: qnie@princeton.edu). Digital Object Identifier 10.1109/TCAD.2019.2925377
Publisher Copyright:
© 1982-2012 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - The increasing importance of inference algorithms, such as neural networks (NNs), principle component analysis (PCA), and singular value decomposition (SVD), etc., has led to the emergence of hardware accelerators to address power-performance tradeoffs in their implementation. Their large data sets make DRAM access the bottleneck for power and performance. Private SRAM scratch-pad memory is used to mitigate the DRAM access penalty but it is a limited resource in size and bandwidth. Thus, accelerator design is not just about computation, but also how data flow is scheduled across the memory hierarchy, including DRAM, scratch-pad SRAM, and datapath registers. Current accelerator design tools automate the generation of customized datapaths to improve performance, but have limited support for reducing DRAM/SRAM accesses during the computation. In this paper, we propose a memory-driven accelerator design methodology for large-scale inference applications, to maximize data access in the datapath and SRAM. We demonstrate its efficacy using several key kernels from large-scale inference applications.
AB - The increasing importance of inference algorithms, such as neural networks (NNs), principle component analysis (PCA), and singular value decomposition (SVD), etc., has led to the emergence of hardware accelerators to address power-performance tradeoffs in their implementation. Their large data sets make DRAM access the bottleneck for power and performance. Private SRAM scratch-pad memory is used to mitigate the DRAM access penalty but it is a limited resource in size and bandwidth. Thus, accelerator design is not just about computation, but also how data flow is scheduled across the memory hierarchy, including DRAM, scratch-pad SRAM, and datapath registers. Current accelerator design tools automate the generation of customized datapaths to improve performance, but have limited support for reducing DRAM/SRAM accesses during the computation. In this paper, we propose a memory-driven accelerator design methodology for large-scale inference applications, to maximize data access in the datapath and SRAM. We demonstrate its efficacy using several key kernels from large-scale inference applications.
KW - Accelerator
KW - data scheduling
KW - hardware/software co-design
KW - large-scale computing
KW - memory utilization
UR - http://www.scopus.com/inward/record.url?scp=85068120652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068120652&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2019.2925377
DO - 10.1109/TCAD.2019.2925377
M3 - Article
AN - SCOPUS:85068120652
SN - 0278-0070
VL - 39
SP - 1875
EP - 1888
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 9
M1 - 8747420
ER -