Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Fast, high-quality pseudo random number generators for heterogeneous computing

May 9, 2023, 11:45 AM
15m
Marriott Ballroom VII (Norfolk Waterside Marriott)

Marriott Ballroom VII

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Speaker

Barbone, Marco (Imperial College London)

Description

Random number generation is key to many applications in a wide variety of disciplines. Depending on the application, the quality of the random numbers from a particular generator can directly impact both computational performance and critically the outcome of the calculation.

High-energy physics applications use Monte Carlo simulations and machine learning widely, which both require high-quality random numbers. In recent years, to meet increasing performance requirements, many high-energy physics workloads leverage GPU acceleration. While on a CPU, there exist a wide variety of generators with different performance and quality characteristics, the same cannot be stated for GPU and FPGA accelerators.

On GPUs, the most common implementation is provided by cuRAND - an NVIDIA library that is not open source or peer reviewed by the scientific community. The highest-quality generator implemented in cuRAND is a version of the Mersenne Twister. Given the availability of better and faster random number generators, high-energy physics moved away from Mersenne Twister several years ago and nowadays MixMax is the standard generator in Geant4 via CLHEP.

The MixMax original design supports parallel streams with a seeding algorithm that makes it especially suited for GPU and FPGA where extreme parallelism is a key factor. In this study we implement the MixMax generator on both architectures and analyze its suitability and applicability for accelerator implementations. We evaluated the results against “Mersenne Twister for a Graphic Processor” (MTGP32) on GPUs which resulted in 5, 13 and 14 times higher throughput when a 240, 17 and 8 sized vector space was used respectively. The MixMax generator coded in VHDL and implemented on Xilinx Ultrascale+ FPGAs, requires 50% fewer total LUTs compared to a 32-bit Mersenne Twister (MT-19337), or ~75% fewer LUTs per output bit.

In summary, the state-of-the art MixMax pseudo random number generator has been implemented on GPU and FPGA platforms and the performance benchmarked.

Consider for long presentation Yes

Primary authors

Barbone, Marco (Imperial College London) Prof. Gaydadjiev, Georgi (University of Groningen) Dr Howard, Alexander (Imperial College London) Prof. Luk, Wayne (Imperial College London) Prof. Savvidy, George (Demokritos National Research Center) Dr Savvidy, Konstantin (National Centre For Scientific Research Demokritos) Prof. Tapper, Alexander (Imperial College (GB))

Presentation materials

Peer reviewing

Paper