Reverie: Low Pass Filter-Based Switch Buffer Sharing for Datacenters with RDMA and TCP Traffic

Vamsi Addanki, Wei Bai, Stefan Schmid, Maria Apostolaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The switch buffers in datacenters today are dynamically shared by traffic classes with different loss tolerance and reaction to congestion signals. In particular, while legacy applications use loss-tolerant transport, e.g., DCTCP, newer applications require lossless datacenter transport, e.g., RDMA over Converged Ethernet. Unfortunately, as we analytically show in this paper, the buffer-sharing practices of today’s datacenters pose a fundamental limitation to effectively isolate RDMA and TCP while also maximizing burst absorption. We identify two root causes: (i) the buffer-sharing for RDMA and TCP relies on two independent and often conflicting views of the buffer, namely ingress and egress; and (ii) the buffer-sharing scheme micromanages the buffer and overreacts to the changes in its occupancy during transient congestion. In this paper, we present REVERIE, a buffer-sharing scheme, which, unlike prior works, is suitable for both lossless and loss-tolerant traffic, providing isolation and better burst absorption than state-of-the-art buffer-sharing schemes. At the core of REVERIE lies a unified (consolidated ingress and egress) admission control that jointly optimizes the buffers for both RDMA and TCP. REVERIE allocates buffer based on a low-pass filter that naturally absorbs bursty queue lengths during transient congestion within the buffer limits. Our evaluation shows that REVERIE can improve the performance of RDMA as well as TCP in terms of flow completion times by up to 33%.

Original languageEnglish (US)
Title of host publicationProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024
PublisherUSENIX Association
Pages651-668
Number of pages18
ISBN (Electronic)9781939133397
StatePublished - 2024
Event21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024 - Santa Clara, United States
Duration: Apr 16 2024Apr 18 2024

Publication series

NameProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024

Conference

Conference21st USENIX Symposium on Networked Systems Design and Implementation, NSDI 2024
Country/TerritoryUnited States
CitySanta Clara
Period4/16/244/18/24

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Reverie: Low Pass Filter-Based Switch Buffer Sharing for Datacenters with RDMA and TCP Traffic'. Together they form a unique fingerprint.

Cite this