The ATLAS experiment Data Acquisition (DAQ) system will be extensively upgraded to fully exploit the High-Luminosity LHC (HL-LHC) upgrade, allowing it to record data at unprecedented rates. The detector will be read out at 1 MHz generating over 5 TB/s of data. This design poses significant challenges for the Ethernet-based network as it will be required to transport 20 times more data than during Run 3. The increased data rate, data sizes, and the number of servers will exacerbate the TCP incast effect observed in the past, which makes it impossible to fully exploit the capabilities of the network and limits the performance of the processing farm.
We present exhaustive and systematic experiments to define buffer requirements in network equipment to minimise the effects of TCP Incast and reduce the impact on the processing applications. Three switch models were stress-tested using DAQ traffic patterns in a test environment at approximately 10% scale of the expected HL-LHC DAQ system size.
As the HL-LHC system's desired hardware is not currently available and the lab size is considerably smaller, tests aim to project buffer requirements with different parameters. Different solutions are analysed, comparing software-based and network hardware cost-to-performance ratios to determine the most effective option to mitigate the impact of TCP incast.
The results of these evaluations will contribute to the decision-making process of acquiring network hardware for the HL-LHC DAQ.
|Consider for long presentation||Yes|