Idiomatic expression identification using semantic compatibility

Ziheng Zeng, Suma Bhat

Research output: Contribution to journalArticlepeer-review

20 Scopus citations


Idiomatic expressions are an integral part of natural language and constantly being added to a language. Owing to their non-compositionality and their ability to take on a figurative or literal meaning depending on the sentential context, they have been a classical challenge for NLP systems. To address this challenge, we study the task of detecting whether a sentence has an idiomatic expression and localizing it when it occurs in a figurative sense. Prior research for this task has studied specific classes of idiomatic expressions offering limited views of their generalizability to new idioms. We propose a multi-stage neural architecture with attention flow as a solution. The network effectively fuses contextual and lexical information at different levels using word and sub-word representations. Empirical evaluations on three of the largest benchmark datasets with idiomatic expressions of varied syntactic patterns and degrees of non-compositionality show that our proposed model achieves new state-of-the-art results. A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines on the largest dataset.

Original languageEnglish (US)
Pages (from-to)1546-1562
Number of pages17
JournalTransactions of the Association for Computational Linguistics
StatePublished - Dec 30 2021
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Communication
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Idiomatic expression identification using semantic compatibility'. Together they form a unique fingerprint.

Cite this