METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

  • Siddhant Ray
  • , Rui Pan
  • , Zhuohan Gu
  • , Kuntai Du
  • , Shaoting Feng
  • , Ganesh Ananthanarayanan
  • , Ravi Netravali
  • , Junchen Jiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge causes higher response delay. Prior work focuses either on reducing the response delay (e.g., better scheduling of RAG queries) or on maximizing quality (e.g., tuning the RAG workflow), but they fall short in systematically balancing the tradeoff between the delay and quality of RAG responses. To balance both quality and response delay, this paper presents METIS, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods. Using four popular RAG-QA datasets, we show that compared to the state-of-the-art RAG optimization schemes, METIS reduces the generation latency by 1.64 - 2.54× without sacrificing generation quality.

Original languageEnglish (US)
Title of host publicationSOSP 2025 - Proceedings of the 2025 ACM SIGOPS 31st Symposium on Operating Systems Principles
PublisherAssociation for Computing Machinery, Inc
Pages606-622
Number of pages17
ISBN (Electronic)9798400718700
DOIs
StatePublished - Oct 12 2025
Event31st ACM Symposium on Operating Systems Principles, SOSP 2025 - Seoul, Korea, Republic of
Duration: Oct 13 2025Oct 16 2025

Publication series

NameSOSP 2025 - Proceedings of the 2025 ACM SIGOPS 31st Symposium on Operating Systems Principles

Conference

Conference31st ACM Symposium on Operating Systems Principles, SOSP 2025
Country/TerritoryKorea, Republic of
CitySeoul
Period10/13/2510/16/25

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Keywords

  • LLM inference
  • RAG systems
  • scheduling

Fingerprint

Dive into the research topics of 'METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation'. Together they form a unique fingerprint.

Cite this