Reshape and Adapt for Output Quantization (RAOQ): Quantization-aware Training for In-memory Computing Systems

Bonan Zhang, Chia Yu Chen, Naveen Verma

Research output: Contribution to journalConference articlepeer-review

Abstract

In-memory computing (IMC) has emerged as a promising solution to address both computation and data-movement challenges, by performing computation on data in-place directly in the memory array.IMC typically relies on analog operation, which makes analog-to-digital converters (ADCs) necessary, for converting results back to the digital domain.However, ADCs maintain computational efficiency by having limited precision, leading to substantial quantization errors in compute outputs.This work proposes RAOQ (Reshape and Adapt for Output Quantization) to overcome this issue, which comprises two classes of mechanisms including: 1) mitigating ADC quantization error by adjusting the statistics of activations and weights, through an activation-shifting approach (A-shift) and a weight reshaping technique (W-reshape); 2) adapting AI models to better tolerate ADC quantization through a bit augmentation method (BitAug), complemented by the introduction of ADC-LoRA, a low-rank approximation technique, to reduce the training overhead.RAOQ demonstrates consistently high performance across different scales and domains of neural network models for computer vision and natural language processing (NLP) tasks at various bit precisions, achieving state-of-the-art results with practical IMC implementations.

Original languageEnglish (US)
Pages (from-to)58739-58762
Number of pages24
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Reshape and Adapt for Output Quantization (RAOQ): Quantization-aware Training for In-memory Computing Systems'. Together they form a unique fingerprint.

Cite this