TY - GEN
T1 - Temporal Consistency for LLM Reasoning Process Error Identification
AU - Guo, Jiacheng
AU - Wu, Yue
AU - Qiu, Jiahao
AU - Huang, Kaixuan
AU - Juan, Xinzhe
AU - Yang, Ling
AU - Wang, Mengdi
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1.
AB - Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1.
UR - https://www.scopus.com/pages/publications/105028961586
UR - https://www.scopus.com/pages/publications/105028961586#tab=citedBy
U2 - 10.18653/v1/2025.findings-emnlp.1205
DO - 10.18653/v1/2025.findings-emnlp.1205
M3 - Conference contribution
AN - SCOPUS:105028961586
T3 - EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
SP - 22114
EP - 22129
BT - EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
A2 - Christodoulopoulos, Christos
A2 - Chakraborty, Tanmoy
A2 - Rose, Carolyn
A2 - Peng, Violet
PB - Association for Computational Linguistics (ACL)
T2 - 30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Y2 - 4 November 2025 through 9 November 2025
ER -