Main Image
Volume 350 - 7th Annual Conference on Large Hadron Collider Physics (LHCP2019) - Parallel Performance
Using ML techniques for Data Quality Monitoring in CMS and ALICE experiments
K.R. Deja* on behalf of the CMS and Alice Collaborations
*corresponding author
Full text: pdf
Pre-published on: 2019 October 16
Published on:
Abstract
Data Quality Assurance plays an important role in all high-energy physics experiments. Currently used methods rely heavily on manual labour and human expert judgements. Hence, multiple attempts are being undertaken to develop automatic solutions especially based on machine learning techniques as the core part of Data Quality Monitoring systems.
However, anomalies caused by detector malfunctioning or sub–optimal data processing are difficult to enumerate a priori and occur rarely, making it difficult to use supervised classification. Therefore, researchers from different experiments including ALICE and CMS work extensively on semi–supervised and unsupervised algorithms in order to distinguish potential outliers without manually assigned labels.

In this contribution, we will discuss several projects whose that aim at solve this task. Machine learning based solutions bring several advantages and may provide fast and reliable data quality assurance, simultaneously reducing the manpower requirements. A good example of this approach is a model based on deep autoencoder employed in the CMS experiment which has been successfully qualified on CMS data collected during the 2016 LHC run. Tests indicate that this solution is able to detect anomalies with high accuracy and low fake rate when compared against the outcome of the manual labelling by experts.

Researchers from the ALICE experiment are currently working on a similar task. They intend to perform a data quality checks in much higher granularity. The current approach is limited to run classification based on manually set cut–offs on descriptive data statistics. More sophisticated machine learning based methods may enable more accurate data selection, on high granularity level of 15-minutes data acquisition periods.
Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.