OpenMP is a directive based shared-memory parallel programming model traditionally used for multicore CPUs. In its recent versions, OpenMP was extended to enable GPU computing via its “target
offloading” model. The architecture agnostic compiler directives can in principle offload to multiple types of GPUs and FPGAs, and its compiler support is under active development.
In this work, we investigate the performance of OpenMP’s GPU offloading capability by porting the ATLAS FastCaloSim code. FastCaloSim is a relatively self-contained parametrized calorimeter
simulation, and is used as a testbed for our investigations of different portable programming models. We find the OpenMP GPU offloading easy to implement and that it does not require major changes to the C++ code. However, the performance varies from compiler
to compiler and the specialized operations (e.g. atomic) are currently less performant than CUDA. We compare the performance with the existing CUDA port across hardware (NVIDIA, AMD) and compilers (LLVM Clang, AMD Clang, gcc, nvc)., SYCL) comparing them to the results obtained with native implementations of the FCS code in corresponding GPU programming languages.
|Consider for long presentation||No|