## Abstract

This work demonstrates two integrated 256-kb in-memory computing (IMC) macros based on foundry MRAM, implemented in a 22-nm fully depleted silicon on insulator (FD-SOI) CMOS process. Embedded non-volatile memory (eNVM), including MRAM, resistive RAM (ReRAM), and phase-change memory (PCM), is an emerging class of technologies that have drawn interest for IMC due to their potential to achieve high density with advanced-node scaling as well as low-power always-on/ duty-cycled operation. However, the typically low bit-cell signals (i.e., resistance contrast) necessitate high-sensitivity readout circuitry, particularly with the high levels of IMC row parallelism desired for maximizing energy efficiency and compute density. This work analyzes power supply and coupling noise, which arises and poses a primary limitation in recent high-sensitivity, high-efficiency architectures, preventing their integration and scale-up in systems on chip (SoCs). To address this, a differential readout architecture is demonstrated, which retains the previous efficiency and density while overcoming power-supply interference and coupling by over <inline-formula> <tex-math notation="LaTeX">$100 \times$</tex-math> </inline-formula> between the many parallel readout channels. The architecture is based on conductance-to-current (<inline-formula> <tex-math notation="LaTeX">$G$</tex-math> </inline-formula>-to-<inline-formula> <tex-math notation="LaTeX">$I$</tex-math> </inline-formula>) conversion, column-weighted combining for analog-to-digital converter (ADC) sharing, and 6-b digitization via a successive-approximation current-to-digital converter (IDC). Enabling fully parallel operation across 128–512 rows and 512 columns, the macros achieve the state-of-the-art energy efficiency of 68.6 1b-TOPS/W, the compute density of 5.43 1b-TOPS/<inline-formula> <tex-math notation="LaTeX">$\text{mm}^{2}$</tex-math> </inline-formula>, and the efficiency–throughput product (reciprocal of area-normalized energy-delay product) of <inline-formula> <tex-math notation="LaTeX">$3.72 \ttimes 10^{26}$</tex-math> </inline-formula>, for the 256 row-parallel operation. CIFAR-10 classification is demonstrated by mapping a six-layer convolutional neural network (NN), achieving iso-software accuracy of 90.25%.

Original language | English (US) |
---|---|

Pages (from-to) | 1-11 |

Number of pages | 11 |

Journal | IEEE Journal of Solid-State Circuits |

DOIs | |

State | Accepted/In press - 2024 |

## All Science Journal Classification (ASJC) codes

- Electrical and Electronic Engineering

## Keywords

- Computer architecture
- Edge computing
- embedded non-volatile memory (eNVM)
- Energy efficiency
- Foundries
- in-memory computing (IMC)
- MRAM
- Parallel processing
- Phase change materials
- Resistance
- scalable architecture
- Signal to noise ratio