The first stage of the LHCb High Level Trigger is implemented as a GPU application. In 2023 it will run on 400 NVIDIA GPUs and its goal is to reduce the rate of incoming data from 5 TB/s to approximately 100 GB/s. A broad scala of reconstruction algorithms is implemented as approximately 70 kernels. Machine Learning algorithms are attractive to further extend the physics reach of the application, but inference must be integrated into the existing GPU application and be very fast to maintain the required throughput of at least 10 GB/s per GPU. We investigate the use of NVIDIA TensorRT for flexible loading of Machine Learning models and fast inference, benchmark its performance for a range of interesting Machine Learning models and compare it to hand-coded implementations where possible.
|Consider for long presentation||No|