TY - JOUR
T1 - An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm in 130-nm CMOS
AU - Zhang, Jintao
AU - Verma, Naveen
N1 - Funding Information:
Manuscript received December 18, 2018; revised February 25, 2019; accepted April 2, 2019. Date of publication April 22, 2019; date of current version June 11, 2019. This work was supported in part by the Air Force Research Laboratory (AFRL) and in part by the Defense Advanced Research Projects Agency (DARPA) under Agreement FA8650-18-2-7866. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) or the U.S. Government. This paper was recommended by Guest Editor C.-Y. Chen. (Corresponding author: Jintao Zhang.) The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: jintao@princeton.edu; nverma@princeton.edu).
Publisher Copyright:
© 2011 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.
AB - Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.
KW - Machine learning
KW - binary neural network
KW - deep neural network
KW - in-memory computing
KW - system-on-chip
UR - http://www.scopus.com/inward/record.url?scp=85067339094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067339094&partnerID=8YFLogxK
U2 - 10.1109/JETCAS.2019.2912352
DO - 10.1109/JETCAS.2019.2912352
M3 - Article
AN - SCOPUS:85067339094
SN - 2156-3357
VL - 9
SP - 358
EP - 366
JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems
JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems
IS - 2
M1 - 8695076
ER -