TY - GEN
T1 - Fast access to columnar, hierarchically nested data via code transformation
AU - Pivarski, Jim
AU - Elmer, Peter
AU - Bockelman, Brian
AU - Zhang, Zhe
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on hierarchically nested, columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present performance comparisons between transformed code and conventional object-oriented code in a High Energy Physics context.
AB - Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on hierarchically nested, columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present performance comparisons between transformed code and conventional object-oriented code in a High Energy Physics context.
KW - Automatic programming
KW - Big data applications
KW - Data analysis
KW - High energy physics instrumentation computing
KW - Scientific computing
UR - http://www.scopus.com/inward/record.url?scp=85047783089&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047783089&partnerID=8YFLogxK
U2 - 10.1109/BigData.2017.8257933
DO - 10.1109/BigData.2017.8257933
M3 - Conference contribution
AN - SCOPUS:85047783089
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 253
EP - 262
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE International Conference on Big Data, Big Data 2017
Y2 - 11 December 2017 through 14 December 2017
ER -