TY - GEN
T1 - Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
AU - Gu, Albert
AU - Johnson, Isys
AU - Goel, Karan
AU - Saab, Khaled
AU - Dao, Tri
AU - Rudra, Atri
AU - Ré, Christopher
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence u → y by simply simulating a linear continuous-time state-space representation (Equation presented) = Ax + Bu, y = Cx + Du. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices A that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use handcrafted features on 100x shorter sequences.
AB - Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence u → y by simply simulating a linear continuous-time state-space representation (Equation presented) = Ax + Bu, y = Cx + Du. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices A that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use handcrafted features on 100x shorter sequences.
UR - http://www.scopus.com/inward/record.url?scp=85131769383&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131769383&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85131769383
T3 - Advances in Neural Information Processing Systems
SP - 572
EP - 585
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -