The development of an LHC physics analysis involves numerous investigations that require the repeated processing of terabytes of measured and simulated data. Thus, a rapid processing turnaround is beneficial to the scientific process. We identified two bottlenecks in analysis independent algorithms and developed the following solutions.
First, inputs are now cached on individual SSD caches of each worker node. Here, cache efficiency and longevity is increased by a cache aware workload scheduling algorithm. Additionally, the algorithm is resilient against changes in workload composition and worker node allocation.
Second, the overall throughput is increased through tailored resource allocation, thus maximizing utilization. For this, the result aggregation, in particular of histograms, and the DNN evaluation are transparently offloaded to dedicated resources satisfying their unique demands. Consequently, the resource needs are homogenized for the primary workload.
Using these measures, a full-fledged LHC Run 2 analysis can be reprocessed from scratch within a few days on a small institute cluster of about 200 logical cores. The individual analysis parts, which are often repeated during development and debugging, have their runtime reduced from hours to minutes, with measured speed ups of up to 1490%. Finally, all these improvements readily carry over to other analyses within the same environment.
|Consider for long presentation||Yes|