Abstract
Idiomatic expressions (IEs), characterized by their non-compositionality, are an impor-tant part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today’s state-of-the-art. Prior work has identified defi-ciencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The im-proved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 1120-1137 |
| Number of pages | 18 |
| Journal | Transactions of the Association for Computational Linguistics |
| Volume | 10 |
| DOIs | |
| State | Published - 2022 |
| Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Communication
- Linguistics and Language
- Human-Computer Interaction
- Computer Science Applications
- Artificial Intelligence