Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing

Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, Naveen Verma

Research output: Contribution to journalArticlepeer-review

65 Scopus citations

Abstract

This work demonstrates a programmable in-memory-computing (IMC) inference accelerator for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise ratio (SNR) capacitor-based analog technology. IMC accelerates computations and reduces memory accessing for matrix-vector multiplies (MVMs), which dominate in NNs. The accelerator architecture focuses on scalable execution, addressing the overheads of state swapping and the challenges of maintaining high utilization across highly dense and parallel hardware. The architecture is based on a configurable on-chip network (OCN) and scalable array of cores, which integrate mixed-signal IMC with programmable near-memory single-instruction multiple-data (SIMD) digital computing, configurable buffering, and programmable control. The cores enable flexible NN execution mappings that exploit data- and pipeline-parallelism to address utilization and efficiency across models. A prototype is presented, incorporating a 4 × 4 array of cores demonstrated in 16 nm CMOS, achieving peak multiply-accumulate (MAC)-level throughput of 3 TOPS and peak MAC-level energy efficiency of 30 TOPS/W, both for 8-b operations. The measured results shows high accuracy of the analog computations, matching bit-true simulations. This enables the abstractions required for robust and scalable architectural and software integration. Developed software libraries and NN-mapping tools are used to demonstrate CIFAR-10 and ImageNet classification, with an 11-layer CNN and ResNet-50, respectively, achieving accuracy, throughput, and energy efficiency of 91.51% and 73.33%, 7815 and 581 image/s, 51.5 k and 3.0 k image/s/W, with 4-b weights and activations.

Original languageEnglish (US)
Pages (from-to)198-211
Number of pages14
JournalIEEE Journal of Solid-State Circuits
Volume57
Issue number1
DOIs
StatePublished - Jan 1 2022

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Keywords

  • Deep learning
  • Hardware accelerators
  • In-memory computing (IMC)
  • Neural networks (NNs)
  • Scalable architecture

Fingerprint

Dive into the research topics of 'Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing'. Together they form a unique fingerprint.

Cite this