An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm in 130-nm CMOS

Jintao Zhang, Naveen Verma

Research output: Contribution to journalArticle

Abstract

Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.

Original languageEnglish (US)
Article number8695076
Pages (from-to)358-366
Number of pages9
JournalIEEE Journal on Emerging and Selected Topics in Circuits and Systems
Volume9
Issue number2
DOIs
StatePublished - Jun 1 2019

Fingerprint

Neural networks
Data storage equipment
Chemical activation
Static random access storage
Parallel processing systems
Energy efficiency
Learning systems
Throughput
Degradation
Deep neural networks
Graphics processing unit

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Keywords

  • Machine learning
  • binary neural network
  • deep neural network
  • in-memory computing
  • system-on-chip

Cite this

@article{dcadfd322449478ab763eb62d3260943,
title = "An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm in 130-nm CMOS",
abstract = "Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4{\%} accuracy degradation (from 94{\%}), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.",
keywords = "Machine learning, binary neural network, deep neural network, in-memory computing, system-on-chip",
author = "Jintao Zhang and Naveen Verma",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/JETCAS.2019.2912352",
language = "English (US)",
volume = "9",
pages = "358--366",
journal = "IEEE Journal on Emerging and Selected Topics in Circuits and Systems",
issn = "2156-3357",
publisher = "IEEE Circuits and Systems Society",
number = "2",

}

An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm in 130-nm CMOS. / Zhang, Jintao; Verma, Naveen.

In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 9, No. 2, 8695076, 01.06.2019, p. 358-366.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An In-memory-Computing DNN Achieving 700 TOPS/W and 6 TOPS/mm in 130-nm CMOS

AU - Zhang, Jintao

AU - Verma, Naveen

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.

AB - Deep neural networks (DNNs) are increasingly popular in machine learning and have achieved the state-of-the-art performance in a range of tasks. Typically, the best results are achieved using a large amount of training data and large models, which make both training and inference complex. While GPUs are used in many applications for the parallel computing they provide, lower energy platforms have the potential to enable a range of new applications. A trend being observed is the ability to reduce the precision of weights and activations, with previous research showing that in some cases, weights and activations can be binarized [i.e., binarized neural networks (BNNs)], significantly reducing the model size. Exploiting this toward reduced compute energy and reduced data-movement energy, we demonstrate the BNN mapped to a previously presented in-memory-computing architecture, where binarized weights are stored in a standard 6T SRAM bit cell and computations are performed via an analog operation. Using a reduced size BNN, chosen to fit on the CMOS prototype (in 130 nm), MNIST classification is achieved with only 0.4% accuracy degradation (from 94%), but at 26\times lower energy compared to a digital approach implementing the same network. The system reaches over 700-TOPS/W energy efficiency and 6-TOPS/mm throughput.

KW - Machine learning

KW - binary neural network

KW - deep neural network

KW - in-memory computing

KW - system-on-chip

UR - http://www.scopus.com/inward/record.url?scp=85067339094&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067339094&partnerID=8YFLogxK

U2 - 10.1109/JETCAS.2019.2912352

DO - 10.1109/JETCAS.2019.2912352

M3 - Article

VL - 9

SP - 358

EP - 366

JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

SN - 2156-3357

IS - 2

M1 - 8695076

ER -