PoS - Proceedings of Science
Volume 340 - The 39th International Conference on High Energy Physics (ICHEP2018) - Posters
Support system for ATLAS distributed computing operations
T. Kishimoto*  on behalf of the ATLAS Collaboration
Full text: pdf
Published on: August 02, 2019
Abstract
The ATLAS distributed computing system has allowed the experiment to successfully meet the challenges of LHC Run 2. In order for distributed computing to operate smoothly and efficiently, several support teams are organized in the ATLAS experiment. The ADCoS is a dedicated group of shifters who follow and report failing jobs, failing data transfers between sites, degradation of ATLAS central computing services, and more. The DAST provides user support to resolve issues related to running distributed analysis on the Grid. The CRC maintains a global view of the day-to-day operations.
In this paper, the status and operational experience of the support system for ATLAS distributed computing in LHC Run 2 are reported. This report also includes operations experience from the Grid site point of view, and an analysis of the errors that create the biggest waste of wallclock time. The report of operations experience will focus on some of the more time-consuming tasks for shifters, and on the introduction of new technologies, such as machine learning, to ease the work.
DOI: https://doi.org/10.22323/1.340.0797
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.