TY - JOUR
T1 - Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference
AU - Ou, Yanghui
AU - Zhang, Hengrui
AU - Rovinski, Austin
AU - Wentzlaff, David
AU - Batten, Christopher
N1 - Publisher Copyright:
© 2002-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.
AB - Large language models (LLMs) have grown exponentially in size, presenting significant challenges to traditional memory architectures. Current high bandwidth memory (HBM) systems are constrained by chiplet I/O bandwidth and the limited number of HBM stacks that can be integrated due to packaging constraints. In this letter, we propose a novel memory system architecture that leverages silicon photonic interconnects to increase memory capacity and bandwidth for compute devices. By introducing optically connected multi-stack HBM modules, we extend the HBM memory system off the compute chip, significantly increasing the number of HBM stacks. Our evaluations show that this architecture can improve training efficiency for a trillion-parameter model by 1.4× compared to a modeled A100 baseline, while also enhancing inference performance by 4.2× if the L2 is modified to provide sufficient bandwidth.
UR - http://www.scopus.com/inward/record.url?scp=85218807912&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218807912&partnerID=8YFLogxK
U2 - 10.1109/LCA.2025.3540058
DO - 10.1109/LCA.2025.3540058
M3 - Article
AN - SCOPUS:85218807912
SN - 1556-6056
JO - IEEE Computer Architecture Letters
JF - IEEE Computer Architecture Letters
ER -