Neural Network Training with Stochastic Hardware Models and Software Abstractions

Bonan Zhang, Lung Yen Chen, Naveen Verma

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Machine learning inference is of broad interest, increasingly in energy-constrained applications. However, platforms are often pushed to their energy limits, especially with deep learning models, which provide state-of-the-art inference performance but are also computationally intensive. This has motivated algorithmic co-design, where flexibility in the model and model parameters, derived from training, is exploited for hardware energy efficiency. This work extends a model-training algorithm referred to as Stochastic Data-Driven Hardware Resilience (S-DDHR) to enable statistical models of computations, amenable for energy/throughput aggressive hardware operating points as well as emerging variation-prone device technologies. S-DDHR itself extends the previous approach of DDHR by incorporating the statistical distribution of hardware variations for model-parameter learning, rather than a sample of the distributions. This is critical to developing accurate and composable abstractions of computations, to enable scalable hardware-generalized training, rather than hardware instance-by-instance training. S-DDHR is demonstrated and evaluated for a bit-scalable MRAM-based in-memory computing architecture, whose energy/throughput trade-offs explicitly motivate statistical computations. Using foundry data to model MRAM device variations, S-DDHR is shown to preserve high inference performance for benchmark datasets (MNIST, CIFAR-10, SVHN) as variation parameters are scaled to high levels, exhibiting less than 3.5% accuracy drop at 10 × the nominal variation level.

Original languageEnglish (US)
Article number9336298
Pages (from-to)1532-1542
Number of pages11
JournalIEEE Transactions on Circuits and Systems I: Regular Papers
Volume68
Issue number4
DOIs
StatePublished - Apr 2021

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Hardware and Architecture

Keywords

  • Statistical computing
  • circuit reliability
  • deep learning
  • fault tolerance
  • in-memory computing

Fingerprint

Dive into the research topics of 'Neural Network Training with Stochastic Hardware Models and Software Abstractions'. Together they form a unique fingerprint.

Cite this