TY - GEN
T1 - Riffle
T2 - 13th EuroSys Conference, EuroSys 2018
AU - Zhang, Haoyu
AU - Cho, Brian
AU - Seyfe, Ergin
AU - Ching, Avery
AU - Freedman, Michael J.
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s).
PY - 2018/4/23
Y1 - 2018/4/23
N2 - The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer—called shuffle operations—become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40% improvement in the end-to-end job completion time.
AB - The rapidly growing size of data and complexity of analytics present new challenges for large-scale data processing systems. Modern systems keep data partitions in memory for pipelined operators, and persist data across stages with wide dependencies on disks for fault tolerance. While processing can often scale well by splitting jobs into smaller tasks for better parallelism, all-to-all data transfer—called shuffle operations—become the scaling bottleneck when running many small tasks in multi-stage data analytics jobs. Our key observation is that this bottleneck is due to the superlinear increase in disk I/O operations as data volume increases. We present Riffle, an optimized shuffle service for big-data analytics frameworks that significantly improves I/O efficiency and scales to process petabytes of data. To do so, Riffle efficiently merges fragmented intermediate shuffle files into larger block files, and thus converts small, random disk I/O requests into large, sequential ones. Riffle further improves performance and fault tolerance by mixing both merged and unmerged block files to minimize merge operation overhead. Using Riffle, Facebook production jobs on Spark clusters with over 1,000 executors experience up to a 10x reduction in the number of shuffle I/O requests and 40% improvement in the end-to-end job completion time.
KW - Big-Data Analytics Frameworks
KW - I/O Optimization
KW - Shuffle Service
KW - Storage
UR - http://www.scopus.com/inward/record.url?scp=85052023490&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052023490&partnerID=8YFLogxK
U2 - 10.1145/3190508.3190534
DO - 10.1145/3190508.3190534
M3 - Conference contribution
AN - SCOPUS:85052023490
T3 - Proceedings of the 13th EuroSys Conference, EuroSys 2018
BT - Proceedings of the 13th EuroSys Conference, EuroSys 2018
PB - Association for Computing Machinery, Inc
Y2 - 23 April 2018 through 26 April 2018
ER -