Building a scalable python distribution for HEP data analysis

Research output: Contribution to journalConference articlepeer-review


There are numerous approaches to building analysis applications across the high-energy physics community. Among them are Python-based, or at least Python-driven, analysis workflows. We aim to ease the adoption of a Python-based analysis toolkit by making it easier for non-expert users to gain access to Python tools for scientific analysis. Experimental software distributions and individual user analysis have quite different requirements. Distributions tend to worry most about stability, usability and reproducibility, while the users usually strive to be fast and nimble. We discuss how we built and now maintain a python distribution for analysis while satisfying requirements both a large software distribution (in our case, that of CMSSW) and user, or laptop, level analysis. We pursued the integration of tools used by the broader data science community as well as HEP developed (e.g., histogrammar, root-numpy) Python packages. We discuss concepts we investigated for package integration and testing, as well as issues we encountered through this process. Distribution and platform support are important topics. We discuss our approach and progress towards a sustainable infrastructure for supporting this Python stack for the CMS user community and for the broader HEP user community.

Original languageEnglish (US)
Article number042041
JournalJournal of Physics: Conference Series
Issue number4
StatePublished - Oct 18 2018
Event18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2017 - Seattle, United States
Duration: Aug 21 2017Aug 25 2017

All Science Journal Classification (ASJC) codes

  • General Physics and Astronomy


Dive into the research topics of 'Building a scalable python distribution for HEP data analysis'. Together they form a unique fingerprint.

Cite this