A Fully Row/Column-Parallel In-Memory Computing Macro in Foundry MRAM With Differential Readout for Noise Rejection

Peter Deaville, Bonan Zhang, Naveen Verma

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This work demonstrates two integrated 256-kb in-memory computing (IMC) macros based on foundry MRAM, implemented in a 22-nm fully depleted silicon on insulator (FD-SOI) CMOS process. Embedded non-volatile memory (eNVM), including MRAM, resistive RAM (ReRAM), and phase-change memory (PCM), is an emerging class of technologies that have drawn interest for IMC due to their potential to achieve high density with advanced-node scaling as well as low-power always-on/ duty-cycled operation. However, the typically low bit-cell signals (i.e., resistance contrast) necessitate high-sensitivity readout circuitry, particularly with the high levels of IMC row parallelism desired for maximizing energy efficiency and compute density. This work analyzes power supply and coupling noise, which arises and poses a primary limitation in recent high-sensitivity, high-efficiency architectures, preventing their integration and scale-up in systems on chip (SoCs). To address this, a differential readout architecture is demonstrated, which retains the previous efficiency and density while overcoming power-supply interference and coupling by over <inline-formula> <tex-math notation="LaTeX">$100 \times$</tex-math> </inline-formula> between the many parallel readout channels. The architecture is based on conductance-to-current (<inline-formula> <tex-math notation="LaTeX">$G$</tex-math> </inline-formula>-to-<inline-formula> <tex-math notation="LaTeX">$I$</tex-math> </inline-formula>) conversion, column-weighted combining for analog-to-digital converter (ADC) sharing, and 6-b digitization via a successive-approximation current-to-digital converter (IDC). Enabling fully parallel operation across 128&#x2013;512 rows and 512 columns, the macros achieve the state-of-the-art energy efficiency of 68.6 1b-TOPS/W, the compute density of 5.43 1b-TOPS/<inline-formula> <tex-math notation="LaTeX">$\text{mm}^{2}$</tex-math> </inline-formula>, and the efficiency&#x2013;throughput product (reciprocal of area-normalized energy-delay product) of <inline-formula> <tex-math notation="LaTeX">$3.72 \ttimes 10^{26}$</tex-math> </inline-formula>, for the 256 row-parallel operation. CIFAR-10 classification is demonstrated by mapping a six-layer convolutional neural network (NN), achieving iso-software accuracy of 90.25%.

Original languageEnglish (US)
Pages (from-to)1-11
Number of pages11
JournalIEEE Journal of Solid-State Circuits
DOIs
StateAccepted/In press - 2024

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Keywords

  • Computer architecture
  • Edge computing
  • embedded non-volatile memory (eNVM)
  • Energy efficiency
  • Foundries
  • in-memory computing (IMC)
  • MRAM
  • Parallel processing
  • Phase change materials
  • Resistance
  • scalable architecture
  • Signal to noise ratio

Fingerprint

Dive into the research topics of 'A Fully Row/Column-Parallel In-Memory Computing Macro in Foundry MRAM With Differential Readout for Noise Rejection'. Together they form a unique fingerprint.

Cite this