Volume 501 - 39th International Cosmic Ray Conference (ICRC2025) - Cosmic-Ray Indirect
Event-by-event primary composition discrimination method using supervised machine learning
W.R. Carvalho* and L. Piotrowski
*: corresponding author
Full text: pdf
Pre-published on: September 23, 2025
Published on:
Abstract
We have developed a radio detection mass discrimination method for cosmic ray events. This method uses supervised machine learning (ML) algorithms, namely random forests (RF), to discriminate between light (p) and heavy (Fe) primary compositions on an event-by-event basis. It bypasses any shower maximum ($X_{max}$) reconstructions and instead tries to infer the primary composition directly. As features of the random forest we used, for each triggered antenna, the distance to the shower axis, the peak amplitude of the electric field and the spectral slope. To perform the discrimination, the method also needs an estimate of the primary or electromagnetic (EM) energy of the shower along with its uncertainty, which is also taken into account. Initially we used a 2-feature approach, only with the antenna distance and the peak electric field amplitude. Yet, we obtained much better than expected accuracies in these test runs, especially at low zeniths. Even with the restrictive feature set and using a huge primary energy uncertainty of 30%, we obtained an accuracy of 82% at $54^\circ$. An analysis of the random forest feature importances uncovered that such good accuracies were possible because the RF was using a large electric field amplitude dependence on the position of $X_{max}$ to perform the discrimination. We describe this amplitude dependence and explain it in detail in our other contribution to this conference. After adding the spectral slope as a third feature, we observed a significant improvement in the discrimination accuracy, which now varied from 81% to 96%, depending on zenith angle ($\theta$). This novel approach may offer particular benefits to radio‑only setups like GRAND. This work is Monte Carlo based and uses ZHAireS simulations along with RDSim to generate separate sets of events for training and testing the random forest algorithm.
DOI: https://doi.org/10.22323/1.501.0212
How to cite

Metadata are provided both in article format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in proceeding format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.