Volume 476 - 42nd International Conference on High Energy Physics (ICHEP2024) - Computing and Data Handling
Enhancing CMS data analyses using a distributed high throughput platform
T. Diotalevi*, C. Battilana, A. Fanfani and D. Bonacorsi
*: corresponding author
Full text: pdf
Pre-published on: January 09, 2025
Published on: April 29, 2025
Abstract
A flexible and dynamic environment capable of accessing distributed data and resources efficiently, is a key aspect for HEP data analysis, especially for the HL-LHC era. A quasi-interactive declarative solution, like ROOT RDataFrame, with scale-up capabilities via open-source standards like Dask, can profit from the "HPC, Big Data and Quantum Computing" Italian Center DataLake model under development. The starting point is a prototypal CMS high throughput analysis platform, offloaded on local Tier-2.
This contribution evaluates the scalability, identifies bottlenecks and explores the interactivity of such platform, on two use-cases: a CMS physics analysis with high-rate triggered events and a study of the CMS muon detector performance in phase-space regions driven by analysis needs, accessing detector datasets. The metrics used to evaluate the scaling and speed-up performance will be reported and results will be discussed, emphasising the differences with the legacy analysis workflows.
DOI: https://doi.org/10.22323/1.476.1007
How to cite

Metadata are provided both in article format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in proceeding format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.