Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions

Ziheng Zeng, Suma Bhat

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Idiomatic expressions (IEs), characterized by their non-compositionality, are an impor-tant part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today’s state-of-the-art. Prior work has identified defi-ciencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The im-proved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.

Original languageEnglish (US)
Pages (from-to)1120-1137
Number of pages18
JournalTransactions of the Association for Computational Linguistics
Volume10
DOIs
StatePublished - 2022
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Communication
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions'. Together they form a unique fingerprint.

Cite this