TY - JOUR
T1 - A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute
AU - Valavi, Hossein
AU - Ramadge, Peter Jeffrey
AU - Nestler, Eric
AU - Verma, Naveen
N1 - Funding Information:
Manuscript received October 23, 2018; revised January 4, 2019 and February 5, 2019; accepted February 9, 2019. Date of publication March 5, 2019; date of current version May 24, 2019. This paper was approved by Associate Editor Vivek De. This work was supported in part by a gift from Analog Devices Inc. (ADI). (Corresponding author: Hossein Valavi.) H. Valavi, P. J. Ramadge, and N. Verma are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: hvalavi@princeton.edu). E. Nestler is with Analog Devices Inc., Boston, MA 02110 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSSC.2019.2899730
Publisher Copyright:
© 1966-2012 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8× 8=64 in-memory-computing neuron tiles, supporting up to 512, 3× 3× 512-input HL neurons and 64, 3× 3× 3-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8× the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.
AB - Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input-output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as 8× 8=64 in-memory-computing neuron tiles, supporting up to 512, 3× 3× 512-input HL neurons and 64, 3× 3× 3-input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having 1.8× the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.
KW - Charge-domain compute
KW - deep learning
KW - hardware accelerators
KW - in-memory computing
KW - neural networks
UR - http://www.scopus.com/inward/record.url?scp=85066442557&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066442557&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2019.2899730
DO - 10.1109/JSSC.2019.2899730
M3 - Article
AN - SCOPUS:85066442557
SN - 0018-9200
VL - 54
SP - 1789
EP - 1799
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 6
M1 - 8660469
ER -