FAIR Principles for data and AI models in high energy physics research and education

Roy, Avik; on behalf of the FAIR4HEP collaboration,

doi:10.22323/1.414.0240

Abstract

In recent years, digital object management practices to support findability, accessibility, interoper-
ability, and reusability (FAIR) have begun to be adopted across a number of data-intensive scientific disciplines. These digital objects include datasets, AI models, software, notebooks, workflows, documentation, etc. With the collective dataset at the Large Hadron Collider scheduled to reach the zettabyte scale by the end of 2032, the experimental particle physics community is looking
at unprecedented data management challenges. It is expected that these grand challenges may be addressed by creating end-to-end AI frameworks that combine FAIR and AI-ready datasets, advances in AI, modern computing environments, and scientific data infrastructure. In this work, the FAIR4HEP collaboration explores the interpretation of FAIR principles in the context of data and AI models for experimental high energy physics research. We investigate metrics to quantify the FAIRness of experimental datasets and AI models, and provide open source notebooks to guide new users on the use of FAIR principles in practice.