MUX-PLMs: Data Multiplexing for High-throughput Language Models

Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, Karthik Narasimhan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput MUX-PLMs that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a 1 − 4% drop on a broad suite of tasks.

Original languageEnglish (US)
Title of host publicationACL 2023 - 8th Workshop on Representation Learning for NLP, RepL4NLP 2023 - Proceedings of the Workshop
EditorsBurcu Can, Maximilian Mozes, Samuel Cahyawijaya, Naomi Saphra, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Chen Zhao, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Lena Voita
PublisherAssociation for Computational Linguistics (ACL)
Pages196-211
Number of pages16
ISBN (Electronic)9781959429777
StatePublished - 2023
Event8th Workshop on Representation Learning for NLP, RepL4NLP 2023, co-located with ACL 2023 - Toronto, Canada
Duration: Jul 13 2023 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference8th Workshop on Representation Learning for NLP, RepL4NLP 2023, co-located with ACL 2023
Country/TerritoryCanada
CityToronto
Period7/13/23 → …

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'MUX-PLMs: Data Multiplexing for High-throughput Language Models'. Together they form a unique fingerprint.

Cite this