Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference

Yanghui Ou, Hengrui Zhang, Austin Rovinski, David Wentzlaff, Christopher Batten

Research output: Contribution to journalArticlepeer-review

Abstract

Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.

Original languageEnglish (US)
JournalIEEE Computer Architecture Letters
DOIs
StateAccepted/In press - 2025

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference'. Together they form a unique fingerprint.

Cite this