Riffle

Optimized Shuffle Service for Large-Scale Data Analytics

Haoyu Zhang, Brian Cho, Ergin Seyfe, Avery Ching, Michael Joseph Freedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer—called shuffle operations—become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40% improvement in the end-to-end job completion time.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th EuroSys Conference, EuroSys 2018
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450355841
DOIs
StatePublished - Apr 23 2018
Event13th EuroSys Conference, EuroSys 2018 - Porto, Portugal
Duration: Apr 23 2018Apr 26 2018

Publication series

NameProceedings of the 13th EuroSys Conference, EuroSys 2018
Volume2018-January

Other

Other13th EuroSys Conference, EuroSys 2018
CountryPortugal
CityPorto
Period4/23/184/26/18

Fingerprint

Fault tolerance
Electric sparks
Data storage equipment
Processing
Big data

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Zhang, H., Cho, B., Seyfe, E., Ching, A., & Freedman, M. J. (2018). Riffle: Optimized Shuffle Service for Large-Scale Data Analytics. In Proceedings of the 13th EuroSys Conference, EuroSys 2018 (Proceedings of the 13th EuroSys Conference, EuroSys 2018; Vol. 2018-January). Association for Computing Machinery, Inc. https://doi.org/10.1145/3190508.3190534
Zhang, Haoyu ; Cho, Brian ; Seyfe, Ergin ; Ching, Avery ; Freedman, Michael Joseph. / Riffle : Optimized Shuffle Service for Large-Scale Data Analytics. Proceedings of the 13th EuroSys Conference, EuroSys 2018. Association for Computing Machinery, Inc, 2018. (Proceedings of the 13th EuroSys Conference, EuroSys 2018).
@inproceedings{e5c84ff81dc044bfa659c6793f0ca7d4,
title = "Riffle: Optimized Shuffle Service for Large-Scale Data Analytics",
abstract = "The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer—called shuffle operations—become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40{\%} improvement in the end-to-end job completion time.",
author = "Haoyu Zhang and Brian Cho and Ergin Seyfe and Avery Ching and Freedman, {Michael Joseph}",
year = "2018",
month = "4",
day = "23",
doi = "10.1145/3190508.3190534",
language = "English (US)",
series = "Proceedings of the 13th EuroSys Conference, EuroSys 2018",
publisher = "Association for Computing Machinery, Inc",
booktitle = "Proceedings of the 13th EuroSys Conference, EuroSys 2018",

}

Zhang, H, Cho, B, Seyfe, E, Ching, A & Freedman, MJ 2018, Riffle: Optimized Shuffle Service for Large-Scale Data Analytics. in Proceedings of the 13th EuroSys Conference, EuroSys 2018. Proceedings of the 13th EuroSys Conference, EuroSys 2018, vol. 2018-January, Association for Computing Machinery, Inc, 13th EuroSys Conference, EuroSys 2018, Porto, Portugal, 4/23/18. https://doi.org/10.1145/3190508.3190534

Riffle : Optimized Shuffle Service for Large-Scale Data Analytics. / Zhang, Haoyu; Cho, Brian; Seyfe, Ergin; Ching, Avery; Freedman, Michael Joseph.

Proceedings of the 13th EuroSys Conference, EuroSys 2018. Association for Computing Machinery, Inc, 2018. (Proceedings of the 13th EuroSys Conference, EuroSys 2018; Vol. 2018-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Riffle

T2 - Optimized Shuffle Service for Large-Scale Data Analytics

AU - Zhang, Haoyu

AU - Cho, Brian

AU - Seyfe, Ergin

AU - Ching, Avery

AU - Freedman, Michael Joseph

PY - 2018/4/23

Y1 - 2018/4/23

N2 - The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer—called shuffle operations—become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40% improvement in the end-to-end job completion time.

AB - The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer—called shuffle operations—become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40% improvement in the end-to-end job completion time.

UR - http://www.scopus.com/inward/record.url?scp=85052023490&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052023490&partnerID=8YFLogxK

U2 - 10.1145/3190508.3190534

DO - 10.1145/3190508.3190534

M3 - Conference contribution

T3 - Proceedings of the 13th EuroSys Conference, EuroSys 2018

BT - Proceedings of the 13th EuroSys Conference, EuroSys 2018

PB - Association for Computing Machinery, Inc

ER -

Zhang H, Cho B, Seyfe E, Ching A, Freedman MJ. Riffle: Optimized Shuffle Service for Large-Scale Data Analytics. In Proceedings of the 13th EuroSys Conference, EuroSys 2018. Association for Computing Machinery, Inc. 2018. (Proceedings of the 13th EuroSys Conference, EuroSys 2018). https://doi.org/10.1145/3190508.3190534