PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing

Jianing Zhou, Hongyu Gong, Suma Bhat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Idiomatic expressions (IE) play an important role in natural language, and have long been a “pain in the neck” for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this paper, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing.

Original languageEnglish (US)
Title of host publicationMWE 2021 - 17th Workshop on Multiword Expressions, Proceedings of the Workshop
EditorsPaul Cook, Jelena Mitrovic, Carla Parra Escartin, Ashwini Vaidya, Petya Osenova, Shiva Taslimipoor, Carlos Ramisch
PublisherAssociation for Computational Linguistics (ACL)
Pages33-48
Number of pages16
ISBN (Electronic)9781954085718
StatePublished - 2021
Externally publishedYes
Event17th Workshop on Multiword Expressions, MWE 2021 - Virtual, Bangkok, Thailand
Duration: Aug 6 2021 → …

Publication series

NameMWE 2021 - 17th Workshop on Multiword Expressions, Proceedings of the Workshop

Conference

Conference17th Workshop on Multiword Expressions, MWE 2021
Country/TerritoryThailand
CityVirtual, Bangkok
Period8/6/21 → …

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing'. Together they form a unique fingerprint.

Cite this