KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Ta Chung Chi, Ting Han Fan, Peter J. Ramadge, Alexander I. Rudnicky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions known for generalizing distance metrics. To maintain the inner product interpretation of self-attention, we show that a CPD kernel can be transformed into a PD kernel by adding a constant offset. This offset is implicitly absorbed in the Softmax normalization during self-attention. The diversity of CPD kernels allows us to derive various RPEs that enable length extrapolation in a principled way. Experiments demonstrate that the logarithmic variant achieves excellent extrapolation performance on three large language modeling datasets. Our implementation and pretrained checkpoints are released at https://github.com/chijames/KERPLE.git.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: Nov 28 2022Dec 9 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period11/28/2212/9/22

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation'. Together they form a unique fingerprint.

Cite this