Volume 334 - The 36th Annual International Symposium on Lattice Field Theory (LATTICE2018) - Algorithms and Machines
Three Dirac operators on two architectures with one piece of code and no hassle
S. Durr
Full text: pdf
Published on: May 29, 2019
Abstract
A simple minded approach to implement three discretizations of the Dirac operator (staggered, Wilson, Brillouin) on two architectures (KNL and core-i7) is presented. The idea is to use a high-level compiler along with OpenMP parallelization and SIMD pragmas, but to stay away from cache-line optimization and/or assembly-tuning. The implementation is for $N_v$ right-hand-sides, and this extra index is used to fill the SIMD pipeline. On one KNL node single precision performance figures for $N_c=3$, $N_v=12$ read 475 Gflop/s, 345 Gflop/s, and 790 Gflop/s for the three discretization schemes, respectively.
DOI: https://doi.org/10.22323/1.334.0033
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access