TY - JOUR
T1 - An array-oriented Python interface for FastJet
AU - Roy, Aryan
AU - Pivarski, Jim
AU - Freer, Chad Wells
N1 - Funding Information:
We would like to thank Matteo Cacciari, Gavin Salam, Gregory Soyez, Salvatore Rappoccio, Patrick Komiske, and Eduardo Rodrigues for helpful discussions. This work was supported by the National Science Foundation under Cooperative Agreement OAC-1836650 (IRIS-HEP).
Publisher Copyright:
© Published under licence by IOP Publishing Ltd.
PY - 2023
Y1 - 2023
N2 - Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source.
AB - Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source.
UR - http://www.scopus.com/inward/record.url?scp=85149741047&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149741047&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/2438/1/012011
DO - 10.1088/1742-6596/2438/1/012011
M3 - Conference article
AN - SCOPUS:85149741047
SN - 1742-6588
VL - 2438
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012011
T2 - 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2021
Y2 - 29 November 2021 through 3 December 2021
ER -