Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Analysis and optimization of ALICE Run 3 multicore Grid jobs

May 9, 2023, 3:15 PM
15m
Marriott Ballroom II-III (Norfolk Waterside Marriott)

Marriott Ballroom II-III

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 4 - Distributed Computing Track 4 - Distributed Computing

Speaker

Bertran Ferrer, Marta (CERN)

Description

For LHC Run3 the ALICE experiment software stack has been completely refactored, incorporating support for multicore job execution. The new multicore jobs spawn multiple processes and threads within the payload. Given that some of the deployed processes may be short-lived, accounting for their resource consumption presents a challenge. This article presents the newly developed methodology for payload execution monitoring, which correctly accounts for the resources used by all processes within the payload.

We also present a black box analysis of the new multicore experiment software framework tracing the used resources and system function calls issued by MonteCarlo simulation jobs. Multiple sources of overhead in the processes and threads lifecycle have thus been identified. This paper describes the tracing techniques and what solutions were implemented to address them. The analysis and subsequent improvements of the code have positively impacted the resource consumption and the overall turnaround time of the payloads with a notable 35% reduction in execution time for a reference production job. We also introduce how this methodology will be used to further improve the efficiency of our experiment software and what other optimization venues are currently being pursued.

Consider for long presentation No

Primary author

Presentation materials

Peer reviewing

Paper