The upcoming exascale computers in the United States and elsewhere will have diverse node architectures, with or without compute accelerators, making it a challenge to maintain a code base that is performance portable across different systems. As part of the US Exascale Computing Project (ECP), the USQCD collaboration has embarked on a collaborative effort to prepare the lattice QCD software suites for exascale, with a particular focus on achieving performance portability across diverse exascale architectures.
In this presentation, I will focus on efforts to use compiler directives, OpenMP and OpenACC, to port the Grid C++ lattice QCD library to AMD/Intel/NVIDIA GPUs and multi/many-core CPUs. Performance comparisons with architecture-native implementations in HIP, SYCL and CUDA will be given. I will also discuss the problems encountered and pros and cons of using compiler directives for performance portability.
|Consider for long presentation||Yes|