PoS - Proceedings of Science
Volume 415 - International Symposium on Grids & Clouds 2022 (ISGC2022) - Data Management & Big Data Session
Exploiting Big Data solutions for CMS computing operations analytics
S. Gasperini*, S. Rossi Tisbeni, D. Bonacorsi and D. Lange
Full text: pdf
Published on: September 28, 2022
Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide
LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and
processing of data at the pre-exascale level. A close and detailed study of the exploited computing
systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of
High Energy Physics (HEP) towards the exascale regime.
In this context, the Compact Muon Solenoid (CMS) experiment has been collecting and storing
over the last few years a large set of heterogeneous non-collision data (e.g. meta-data about
replicas placement, transfer operations, and actual user access to physics datasets). All this data
richness is currently residing on a distributed Hadoop cluster, and it is organized so that running
fast and arbitrary queries using the Spark analytics framework is a viable approach for Big Data
mining efforts. Using a data-driven approach oriented to the analysis of this meta-data deriving
from several CMS computing services, such as DBS (Data Bookkeeping Service) and MCM
(Monte Carlo Management system), we started to focus on data storage and data access over
the WLCG infrastructure, and we drafted an embryonal software toolkit to investigate recurrent
patterns and provide indicators about physics datasets popularity. As a long-term goal, this aims
at contributing to the overall design of a predictive/adaptive system that would eventually reduce
costs and complexity of the CMS computing operations, while taking into account the stringent
requests by the physics analysts community.
DOI: https://doi.org/10.22323/1.415.0006
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.