Abstract
Idiomatic expressions are an integral part of natural language and constantly being added to a language. Owing to their non-compositionality and their ability to take on a figurative or literal meaning depending on the sentential context, they have been a classical challenge for NLP systems. To address this challenge, we study the task of detecting whether a sentence has an idiomatic expression and localizing it when it occurs in a figurative sense. Prior research for this task has studied specific classes of idiomatic expressions offering limited views of their generalizability to new idioms. We propose a multi-stage neural architecture with attention flow as a solution. The network effectively fuses contextual and lexical information at different levels using word and sub-word representations. Empirical evaluations on three of the largest benchmark datasets with idiomatic expressions of varied syntactic patterns and degrees of non-compositionality show that our proposed model achieves new state-of-the-art results. A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines on the largest dataset.
Original language | English (US) |
---|---|
Pages (from-to) | 1546-1562 |
Number of pages | 17 |
Journal | Transactions of the Association for Computational Linguistics |
Volume | 9 |
DOIs | |
State | Published - Dec 30 2021 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Communication
- Human-Computer Interaction
- Linguistics and Language
- Computer Science Applications
- Artificial Intelligence