TY - GEN
T1 - Acorn
T2 - 10th ACM Symposium on Cloud Computing, SoCC 2019
AU - Ramjit, Lana
AU - Interlandi, Matteo
AU - Wu, Eugene
AU - Netravali, Ravi
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/11/20
Y1 - 2019/11/20
N2 - Result caching is crucial to the performance of data processing systems, but two trends complicate its use. First, immutable datasets make it difficult to efficiently employ powerful result caching techniques like predicate analysis, since predicate analysis typically requires optimized query plans but generating those plans can be costly with data immutability. Second, increased support for user-defined functions (UDFs), which are treated as black boxes by query engines, hinders aggressive result caching. This paper overcomes these problems by introducing 1) a judicious adaptation of predicate analysis on analyzed query plans that avoids unnecessary query optimization, and 2) a UDF translator that transparently compiles UDFs from general purpose languages into native equivalents. We then present Acorn, a concrete implementation of these techniques in Spark SQL that provides speedups of up to 5x across multiple benchmark and real Spark graph processing workloads.
AB - Result caching is crucial to the performance of data processing systems, but two trends complicate its use. First, immutable datasets make it difficult to efficiently employ powerful result caching techniques like predicate analysis, since predicate analysis typically requires optimized query plans but generating those plans can be costly with data immutability. Second, increased support for user-defined functions (UDFs), which are treated as black boxes by query engines, hinders aggressive result caching. This paper overcomes these problems by introducing 1) a judicious adaptation of predicate analysis on analyzed query plans that avoids unnecessary query optimization, and 2) a UDF translator that transparently compiles UDFs from general purpose languages into native equivalents. We then present Acorn, a concrete implementation of these techniques in Spark SQL that provides speedups of up to 5x across multiple benchmark and real Spark graph processing workloads.
KW - computation reuse
KW - data analytics frameworks
KW - materialized views
KW - result caching
KW - user-defined functions
UR - http://www.scopus.com/inward/record.url?scp=85091777484&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091777484&partnerID=8YFLogxK
U2 - 10.1145/3357223.3362702
DO - 10.1145/3357223.3362702
M3 - Conference contribution
AN - SCOPUS:85091777484
T3 - SoCC 2019 - Proceedings of the ACM Symposium on Cloud Computing
SP - 206
EP - 219
BT - SoCC 2019 - Proceedings of the ACM Symposium on Cloud Computing
PB - Association for Computing Machinery
Y2 - 20 November 2019 through 23 November 2019
ER -