Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings

Chunsheng Zuo, Pavel Guerzhoy, Michael Guerzhoy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Transformers with causal attention can solve tasks that require positional information without using positional encodings. In this work, we propose and investigate a new hypothesis about how positional information can be stored without using explicit positional encoding. We observe that nearby embeddings are more similar to each other than faraway embeddings, allowing the transformer to potentially reconstruct the positions of tokens. We show that this pattern can occur in both the trained and the randomly initialized Transformer models with causal attention and no positional encodings over a common range of hyperparameters.

Original languageEnglish (US)
Title of host publicationMain Conference
EditorsOwen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
PublisherAssociation for Computational Linguistics (ACL)
Pages9418-9430
Number of pages13
ISBN (Electronic)9798891761964
StatePublished - 2025
Externally publishedYes
Event31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates
Duration: Jan 19 2025Jan 24 2025

Publication series

NameProceedings - International Conference on Computational Linguistics, COLING
VolumePart F206484-1
ISSN (Print)2951-2093

Conference

Conference31st International Conference on Computational Linguistics, COLING 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period1/19/251/24/25

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings'. Together they form a unique fingerprint.

Cite this