This work demonstrates a programmable in-memory-computing (IMC) inference accelerator for scalable execution of neural network (NN) models, leveraging a high-signal-to-noise ratio (SNR) capacitor-based analog technology. IMC accelerates computations and reduces memory accessing for matrix-vector multiplies (MVMs), which dominate in NNs. The accelerator architecture focuses on scalable execution, addressing the overheads of state swapping and the challenges of maintaining high utilization across highly dense and parallel hardware. The architecture is based on a configurable on-chip network (OCN) and scalable array of cores, which integrate mixed-signal IMC with programmable near-memory single-instruction multiple-data (SIMD) digital computing, configurable buffering, and programmable control. The cores enable flexible NN execution mappings that exploit data- and pipeline-parallelism to address utilization and efficiency across models. A prototype is presented, incorporating a 4 × 4 array of cores demonstrated in 16 nm CMOS, achieving peak multiply-accumulate (MAC)-level throughput of 3 TOPS and peak MAC-level energy efficiency of 30 TOPS/W, both for 8-b operations. The measured results shows high accuracy of the analog computations, matching bit-true simulations. This enables the abstractions required for robust and scalable architectural and software integration. Developed software libraries and NN-mapping tools are used to demonstrate CIFAR-10 and ImageNet classification, with an 11-layer CNN and ResNet-50, respectively, achieving accuracy, throughput, and energy efficiency of 91.51% and 73.33%, 7815 and 581 image/s, 51.5 k and 3.0 k image/s/W, with 4-b weights and activations.
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- Deep learning
- Hardware accelerators
- In-memory computing (IMC)
- Neural networks (NNs)
- Scalable architecture