TY - GEN
T1 - Catching the microburst culprits with snappy
AU - Chen, Xiaoqi
AU - Feibish, Shir Landau
AU - Koral, Yaron
AU - Rexford, Jennifer L.
AU - Rottenstreich, Ori
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/8/7
Y1 - 2018/8/7
N2 - Short-lived traffic surges, known as microbursts, can cause periods of unexpectedly high packet delay and loss on a link. Today, preventing microbursts requires deploying switches with larger packet buffers (incurring higher cost) or running the network at low utilization (sacrificing efficiency). Instead, we argue that switches should detect microbursts as they form, and take corrective action before the situation gets worse. This requires an efficient way for switches to identify the particular flows responsible for a microburst, and handle them automatically (e.g., by pacing, marking, or rerouting the packets). However, collecting fine-grained statistics about queue occupancy in real time is challenging, even with emerging programmable data planes. We present Snappy, which identifies the flows responsible for a microburst in real time. Snappy maintains multiple snapshots of the occupants of the queue over time, where each snapshot is a compact data structure that makes efficient use of data-plane memory. As each new packet arrives, Snappy updates one snapshot and also estimates the fraction of the queue occupied by the associated flow. Our simulations with data-center packet traces show that Snappy can target the flows responsible for microbursts at the sub-millisecond level.
AB - Short-lived traffic surges, known as microbursts, can cause periods of unexpectedly high packet delay and loss on a link. Today, preventing microbursts requires deploying switches with larger packet buffers (incurring higher cost) or running the network at low utilization (sacrificing efficiency). Instead, we argue that switches should detect microbursts as they form, and take corrective action before the situation gets worse. This requires an efficient way for switches to identify the particular flows responsible for a microburst, and handle them automatically (e.g., by pacing, marking, or rerouting the packets). However, collecting fine-grained statistics about queue occupancy in real time is challenging, even with emerging programmable data planes. We present Snappy, which identifies the flows responsible for a microburst in real time. Snappy maintains multiple snapshots of the occupants of the queue over time, where each snapshot is a compact data structure that makes efficient use of data-plane memory. As each new packet arrives, Snappy updates one snapshot and also estimates the fraction of the queue occupied by the associated flow. Our simulations with data-center packet traces show that Snappy can target the flows responsible for microbursts at the sub-millisecond level.
UR - http://www.scopus.com/inward/record.url?scp=85055124508&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055124508&partnerID=8YFLogxK
U2 - 10.1145/3229584.3229586
DO - 10.1145/3229584.3229586
M3 - Conference contribution
AN - SCOPUS:85055124508
SN - 9781450359146
T3 - SelfDN 2018 - Proceedings of the 2018 Afternoon Workshop on Self-Driving Networks, Part of SIGCOMM 2018
SP - 22
EP - 28
BT - SelfDN 2018 - Proceedings of the 2018 Afternoon Workshop on Self-Driving Networks, Part of SIGCOMM 2018
PB - Association for Computing Machinery, Inc
T2 - 2018 Afternoon Workshop on Self-Driving Networks, SelfDN 2018
Y2 - 24 August 2018
ER -