Abstract
Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.
Original language | English (US) |
---|---|
Article number | 8695076 |
Pages (from-to) | 358-366 |
Number of pages | 9 |
Journal | IEEE Journal on Emerging and Selected Topics in Circuits and Systems |
Volume | 9 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2019 |
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
Keywords
- Machine learning
- binary neural network
- deep neural network
- in-memory computing
- system-on-chip