Event-by-event primary composition discrimination method using supervised machine learning

Carvalho, W. R.; Piotrowski, Lech

doi:10.22323/1.501.0212

Abstract

We have developed a radio detection mass discrimination method for cosmic ray events. This method uses supervised machine learning (ML) algorithms, namely random forests (RF), to discriminate between light (p) and heavy (Fe) primary compositions on an event-by-event basis. It bypasses any shower maximum ($X_{max}$) reconstructions and instead tries to infer the primary composition directly. As features of the random forest we used, for each triggered antenna, the distance to the shower axis, the peak amplitude of the electric field and the spectral slope. To perform the discrimination, the method also needs an estimate of the primary or electromagnetic (EM) energy of the shower along with its uncertainty, which is also taken into account. Initially we used a 2-feature approach, only with the antenna distance and the peak electric field amplitude. Yet, we obtained much better than expected accuracies in these test runs, especially at low zeniths. Even with the restrictive feature set and using a huge primary energy uncertainty of 30%, we obtained an accuracy of 82% at $54^\circ$. An analysis of the random forest feature importances uncovered that such good accuracies were possible because the RF was using a large electric field amplitude dependence on the position of $X_{max}$ to perform the discrimination. We describe this amplitude dependence and explain it in detail in our other contribution to this conference. After adding the spectral slope as a third feature, we observed a significant improvement in the discrimination accuracy, which now varied from 81% to 96%, depending on zenith angle ($\theta$). This novel approach may offer particular benefits to radio‑only setups like GRAND. This work is Monte Carlo based and uses ZHAireS simulations along with RDSim to generate separate sets of events for training and testing the random forest algorithm.