Abstract
Idiomatic expressions (IEs), characterized by their non-compositionality, are an impor-tant part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today’s state-of-the-art. Prior work has identified defi-ciencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The im-proved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.
Original language | English (US) |
---|---|
Pages (from-to) | 1120-1137 |
Number of pages | 18 |
Journal | Transactions of the Association for Computational Linguistics |
Volume | 10 |
DOIs | |
State | Published - 2022 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Communication
- Human-Computer Interaction
- Linguistics and Language
- Computer Science Applications
- Artificial Intelligence