ALICE has upgraded many of its detectors for LHC Run 3 to operate in continuous readout mode recording Pb-Pb collisions at 50 kHz interaction rate without trigger.
This results in the need to process data in real time at rates 50 times higher than during Run 2. In order to tackle such a challenge we introduced O2, a new computing system and the associated infrastructure. Designed and implemented during the long shutdown, O2 is now in production taking care of all the data processing needs of the experiment.
O2 is designed around the message passing paradigm enabling resilient, parallel data processing for both the synchronous (to LHC beam) and asynchronous data taking and processing phases.
The main purpose of the synchronous online reconstruction is detector calibration and raw data compression. This synchronous processing is dominated by the TPC detector, which produces by far the largest data volume, and TPC reconstruction is fully running on GPUs.
When there is no beam in the LHC, the powerful GPU-equipped online computing farm of ALICE is used for the asynchronous reconstruction, which creates the final reconstruction output for analysis from the compressed raw data.
Since the majority of the compute performance of the online farm is in the GPUs, and since the asynchronous processing is not dominated by the TPC in the way the synchronous processing is, there is an ongoing effort to offload a significant amount of compute load from other detectors to the GPU as well.
The talk will present the experience from running the O2 framework in production during the 2022 ALICE data taking, with particular regard to the GPU usage, an overview of the current state and the plans for the asynchronous reconstruction, and the current performance of synchronous and asynchronous reconstruction with GPUs for pp and Pb-Pb data.
|Consider for long presentation||Yes|