std::execution::parallel (often shortened to std::par) is a method of executing algorithms in parallel that was introduced in the C++17 standard. It defines a number of execution policies that
target various levels of parallelization using threads and vectorization. While originally designed for CPUs, backends that can execute on GPUs have recently been introduced by NVIDIA and Intel. Unlike APIs such as CUDA, HIP or SYCL that can explicitly target
low level GPU features to achieve maximum performance, std::par only offer much higher levels of parallelism abstraction, and should be considered more of a stepping stone between pure CPU code and that written explicitly for GPUs. However, as a stepping stone
it provides a very low entry bar for programmers to transition their code from single core CPUs to multicore machines and to GPUs. Alpaka is a C++ header-only library that exploits task- and data-parallelism, and uses all levels of accelerator memory hierarchy
to achieve performance portability of the application code. It has backend support for NVIDIA, AMD and Intel GPUs, as well as multicore/many-core CPUs.
In this submission we will discuss the process by which the ATLAS Fast Calorimeter Simulation (FCS), a small standalone codebase that uses shower parameterization techniques to quickly simulate
particle interactions in the ATLAS Liquid Argon Calorimeter, was separately ported to std::par and Alpaka. We will also compare the results of running the simulation on CPUs and various GPUs to the native GPU programming models such as CUDA, HIP and SYCL.
|Consider for long presentation||No|