TY - GEN
T1 - MUX-PLMs
T2 - 8th Workshop on Representation Learning for NLP, RepL4NLP 2023, co-located with ACL 2023
AU - Murahari, Vishvak
AU - Deshpande, Ameet
AU - Jimenez, Carlos E.
AU - Shafran, Izhak
AU - Wang, Mingqiu
AU - Cao, Yuan
AU - Narasimhan, Karthik
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput MUX-PLMs that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a 1 − 4% drop on a broad suite of tasks.
AB - The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput MUX-PLMs that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a 1 − 4% drop on a broad suite of tasks.
UR - http://www.scopus.com/inward/record.url?scp=85174542753&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174542753&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85174542753
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 196
EP - 211
BT - ACL 2023 - 8th Workshop on Representation Learning for NLP, RepL4NLP 2023 - Proceedings of the Workshop
A2 - Can, Burcu
A2 - Mozes, Maximilian
A2 - Cahyawijaya, Samuel
A2 - Saphra, Naomi
A2 - Kassner, Nora
A2 - Ravfogel, Shauli
A2 - Ravichander, Abhilasha
A2 - Zhao, Chen
A2 - Augenstein, Isabelle
A2 - Rogers, Anna
A2 - Cho, Kyunghyun
A2 - Grefenstette, Edward
A2 - Voita, Lena
PB - Association for Computational Linguistics (ACL)
Y2 - 13 July 2023
ER -