Acorn: Aggressive Result Caching in Distributed Data Processing Frameworks

Lana Ramjit, Matteo Interlandi, Eugene Wu, Ravi Netravali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Result caching is crucial to the performance of data processing systems, but two trends complicate its use. First, immutable datasets make it difficult to efficiently employ powerful result caching techniques like predicate analysis, since predicate analysis typically requires optimized query plans but generating those plans can be costly with data immutability. Second, increased support for user-defined functions (UDFs), which are treated as black boxes by query engines, hinders aggressive result caching. This paper overcomes these problems by introducing 1) a judicious adaptation of predicate analysis on analyzed query plans that avoids unnecessary query optimization, and 2) a UDF translator that transparently compiles UDFs from general purpose languages into native equivalents. We then present Acorn, a concrete implementation of these techniques in Spark SQL that provides speedups of up to 5x across multiple benchmark and real Spark graph processing workloads.

Original languageEnglish (US)
Title of host publicationSoCC 2019 - Proceedings of the ACM Symposium on Cloud Computing
PublisherAssociation for Computing Machinery
Pages206-219
Number of pages14
ISBN (Electronic)9781450369732
DOIs
StatePublished - Nov 20 2019
Externally publishedYes
Event10th ACM Symposium on Cloud Computing, SoCC 2019 - Santa Cruz, United States
Duration: Nov 20 2019Nov 23 2019

Publication series

NameSoCC 2019 - Proceedings of the ACM Symposium on Cloud Computing

Conference

Conference10th ACM Symposium on Cloud Computing, SoCC 2019
Country/TerritoryUnited States
CitySanta Cruz
Period11/20/1911/23/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computational Theory and Mathematics

Keywords

  • computation reuse
  • data analytics frameworks
  • materialized views
  • result caching
  • user-defined functions

Fingerprint

Dive into the research topics of 'Acorn: Aggressive Result Caching in Distributed Data Processing Frameworks'. Together they form a unique fingerprint.

Cite this