C-STS: Conditional Semantic Textual Similarity

Ameet Deshpande, Carlos E. Jimenez, Howard Chen, Vishvak Murahari, Victoria Graf, Tanmay Rajpurohit, Ashwin Kalyan, Danqi Chen, Karthik Narasimhan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding. However, sentence similarity can be inherently ambiguous, depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called Conditional STS (C-STS) which measures sentences' similarity conditioned on an feature described in natural language (hereon, condition). As an example, the similarity between the sentences “The NBA player shoots a three-pointer.” and “A man throws a tennis ball into the air to serve.” is higher for the condition “The motion of the ball” (both upward) and lower for “The size of the ball” (one large and one small). C-STS's advantages are two-fold: (1) it reduces the subjectivity and ambiguity of STS and (2) enables fine-grained language model evaluation through diverse natural language conditions. We put several state-of-the-art models to the test, and even those performing well on STS (e.g. SimCSE, Flan-T5, and GPT-4) find CSTS challenging; all yielding Spearman correlation scores below 50. To encourage a more comprehensive evaluation of semantic similarity and natural language understanding, we make nearly 19K C-STS examples and code available for others to train and test their models.

Original languageEnglish (US)
Title of host publicationEMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
EditorsHouda Bouamor, Juan Pino, Kalika Bali
PublisherAssociation for Computational Linguistics (ACL)
Pages5669-5690
Number of pages22
ISBN (Electronic)9798891760608
StatePublished - 2023
Event2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, Singapore
Duration: Dec 6 2023Dec 10 2023

Publication series

NameEMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Country/TerritorySingapore
CityHybrid, Singapore
Period12/6/2312/10/23

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'C-STS: Conditional Semantic Textual Similarity'. Together they form a unique fingerprint.

Cite this