TY - GEN
T1 - DataMUX
T2 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
AU - Murahari, Vishvak
AU - Jimenez, Carlos E.
AU - Yang, Runzhe
AU - Narasimhan, Karthik
N1 - Publisher Copyright:
© 2022 Neural information processing systems foundation. All rights reserved.
PY - 2022
Y1 - 2022
N2 - In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. DataMUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased inference throughput with minimal extra memory requirements. Our approach uses two key components - 1) a multiplexing layer that performs a fixed linear transformation to each input before combining them to create a 'mixed' representation of the same size as a single input, which is then processed by the base network, and 2) a demultiplexing layer that converts the base network's output back into independent representations before producing predictions for each input. We show the viability of DataMUX for different architectures (Transformers, and to a much lesser extent MLPs and CNNs) across six different tasks spanning sentence classification, named entity recognition and image classification. For instance, DataMUX for Transformers can multiplex up to 20x/40x inputs, achieving up to 11x/18x increase in inference throughput with absolute performance drops of < 2% and < 4% respectively compared to a vanilla Transformer on MNLI, a natural language inference task. We also provide a theoretical construction for multiplexing in self-attention networks and analyze the effect of various design elements in DataMUX.
AB - In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. DataMUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased inference throughput with minimal extra memory requirements. Our approach uses two key components - 1) a multiplexing layer that performs a fixed linear transformation to each input before combining them to create a 'mixed' representation of the same size as a single input, which is then processed by the base network, and 2) a demultiplexing layer that converts the base network's output back into independent representations before producing predictions for each input. We show the viability of DataMUX for different architectures (Transformers, and to a much lesser extent MLPs and CNNs) across six different tasks spanning sentence classification, named entity recognition and image classification. For instance, DataMUX for Transformers can multiplex up to 20x/40x inputs, achieving up to 11x/18x increase in inference throughput with absolute performance drops of < 2% and < 4% respectively compared to a vanilla Transformer on MNLI, a natural language inference task. We also provide a theoretical construction for multiplexing in self-attention networks and analyze the effect of various design elements in DataMUX.
UR - http://www.scopus.com/inward/record.url?scp=85163185827&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163185827&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85163185827
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
A2 - Koyejo, S.
A2 - Mohamed, S.
A2 - Agarwal, A.
A2 - Belgrave, D.
A2 - Cho, K.
A2 - Oh, A.
PB - Neural information processing systems foundation
Y2 - 28 November 2022 through 9 December 2022
ER -