A Reasoning-Focused Legal Retrieval Benchmark

Lucia Zheng, Neel Guha, Javokhir Arifov, Sarah Zhang, Michal Skreta, Christopher D. Manning, Peter Henderson, Daniel E. Ho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the legal community increasingly examines the use of large language models (LLMs) for various legal applications, legal AI developers have turned to retrieval-augmented LLMs (“RAG” systems) to improve system performance and robustness. An obstacle to the development of specialized RAG systems is the lack of realistic legal RAG benchmarks which capture the complexity of both legal retrieval and downstream legal question-answering. To address this, we introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA. Our tasks correspond to real-world legal research tasks, and were produced through annotation processes which resemble legal research. We describe the construction of these benchmarks and the performance of existing retriever pipelines. Our results suggest that legal RAG remains a challenging application, thus motivating future research.

Original languageEnglish (US)
Title of host publicationCS and LAW 2025 - Proceedings of the 2025 Symposium on Computer Science and Law
PublisherAssociation for Computing Machinery, Inc
Pages169-193
Number of pages25
ISBN (Electronic)9798400714214
DOIs
StatePublished - Mar 25 2025
Event4th ACM Symposium on Computer Science and Law, CS and LAW 2025 - Munchen, Germany
Duration: Mar 25 2025Mar 27 2025

Publication series

NameCS and LAW 2025 - Proceedings of the 2025 Symposium on Computer Science and Law

Conference

Conference4th ACM Symposium on Computer Science and Law, CS and LAW 2025
Country/TerritoryGermany
CityMunchen
Period3/25/253/27/25

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Law

Keywords

  • benchmark
  • dataset
  • reasoning
  • retrieval

Fingerprint

Dive into the research topics of 'A Reasoning-Focused Legal Retrieval Benchmark'. Together they form a unique fingerprint.

Cite this