TY - GEN
T1 - Transformer Working Memory Enables Regular Language Reasoning And Natural Language Length Extrapolation
AU - Chi, Ta Chung
AU - Fan, Ting Han
AU - Rudnicky, Alexander I.
AU - Ramadge, Peter J.
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation.
AB - Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation.
UR - http://www.scopus.com/inward/record.url?scp=85183293682&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183293682&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85183293682
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 5972
EP - 5984
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Y2 - 6 December 2023 through 10 December 2023
ER -