Scalable training on scalable infrastructures for programmable hardware
October 25, 2023
Machine learning (ML) and deep learning (DL) techniques are playing an increasingly pervasive and dominant role in High Energy Physics, but this is posing several challenges. Effective computing infrastructures are required for executing AI workflows, and there is a growing demand for training opportunities to upskill users and developers in exploiting programmable hardware such as FPGAs. While there are many training opportunities on generic ML/DL concepts, there is a gap in hands-on tutorials on ML/DL on FPGAs that can cater to a large number of attendants and provide access to a diverse set of hardware with varying specs. This highlights the need for the development of scalable and inclusive training tools to bridge this gap.
INFN-Bologna, the University of Bologna, and INFN-CNAF collaborated on a pilot course on ML/DL on FPGAs, which was successful in paving the way for the creation of a scalable toolkit for future courses. The course used virtual machines, in-house cloud platforms equipped with AMD/Xilinx Alveo FPGA, and Amazon AWS instances for project deployment on FPGAs. Docker containers with full environments for DL frameworks and Jupyter Notebooks were used for interactive exercises.
Finally, the Bond Machine, a software ecosystem that can dynamically generate computer architectures synthesizable in FPGA, is being explored as an alternative for teaching FPGA programming. It offers hardware abstraction, which simplifies interaction with FPGAs and avoids the need to delve into low-level details.
How to cite
Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating
very compact bibliographies which can be beneficial to authors and
readers, and in "proceeding" format
which is more detailed and complete.