TY - GEN
T1 - A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing
AU - Jia, Hongyang
AU - Ozatay, Murat
AU - Tang, Yinqi
AU - Valavi, Hossein
AU - Pathak, Rakshit
AU - Lee, Jinseok
AU - Verma, Naveen
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/2/13
Y1 - 2021/2/13
N2 - This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.
AB - This paper presents a scalable neural-network (NN) inference accelerator in 16nm, based on an array of programmable cores employing mixed-signal In-Memory Computing (IMC), digital Near-Memory Computing (NMC), and localized buffering/control. IMC achieves high energy efficiency and throughput for matrix-vector multiplications (MVMs), which dominate NNs; but, scalability poses numerous challenges, both technologically, going to advanced nodes to maintain gains over digital architectures, and architecturally, for full execution of diverse NNs. Recent demonstrations have explored integrating IMC in programmable processors [1, 2], but have not achieved IMC efficiency and throughput for full executions. The central challenge is drastically different physical design points and associated tradeoffs incurred by IMC compared to digital engines. Namely, IMC substantially increases compute energy efficiency and HW density/parallelism, but retains the overheads of HW virtualization (state and data swapping/buffering/communication across spatial/temporal computation mappings). The demonstrated architecture is co-designed with SW-mapping algorithms (encapsulated in a custom graph compiler), to provide efficiency across a broad range of mapping strategies, to overcome these overheads.
UR - http://www.scopus.com/inward/record.url?scp=85102365085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102365085&partnerID=8YFLogxK
U2 - 10.1109/ISSCC42613.2021.9365788
DO - 10.1109/ISSCC42613.2021.9365788
M3 - Conference contribution
AN - SCOPUS:85102365085
T3 - Digest of Technical Papers - IEEE International Solid-State Circuits Conference
SP - 236
EP - 238
BT - 2021 IEEE International Solid-State Circuits Conference, ISSCC 2021 - Digest of Technical Papers
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Solid-State Circuits Conference, ISSCC 2021
Y2 - 13 February 2021 through 22 February 2021
ER -