Skip to main navigation Skip to search Skip to main content

Temporal Consistency for LLM Reasoning Process Error Identification

  • Jiacheng Guo
  • , Yue Wu
  • , Jiahao Qiu
  • , Kaixuan Huang
  • , Xinzhe Juan
  • , Ling Yang
  • , Mengdi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1.

Original languageEnglish (US)
Title of host publicationEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
PublisherAssociation for Computational Linguistics (ACL)
Pages22114-22129
Number of pages16
ISBN (Electronic)9798891763357
DOIs
StatePublished - 2025
Externally publishedYes
Event30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, China
Duration: Nov 4 2025Nov 9 2025

Publication series

NameEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025

Conference

Conference30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Country/TerritoryChina
CitySuzhou
Period11/4/2511/9/25

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Temporal Consistency for LLM Reasoning Process Error Identification'. Together they form a unique fingerprint.

Cite this