Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond

May 11, 2023, 12:15 PM
15m
Marriott Ballroom II-III (Norfolk Waterside Marriott)

Marriott Ballroom II-III

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 4 - Distributed Computing Track 4 - Distributed Computing

Speaker

Dr Pérez-Calero Yzquierdo, Antonio (CIEMAT - PIC)

Description

The computing resources supporting the LHC experiments research programmes are still dominated by x86 processors deployed at WLCG sites. This will however evolve in the coming years, as a growing number of HPC and Cloud facilities will be employed by the collaborations in order to process the vast amounts of data to be collected in the LHC Run 3 and into the HL-LHC phase. Compute power in these facilities typically includes a significant (or even dominant) fraction of non-x86 components, such as alternative CPU architectures (ARM, Power) and a variety of GPU specifications. Using these heterogeneous resources efficiently will be therefore essential for the LHC collaborations reaching their scientific goals. The Submission Infrastructure (SI) is a central element in the CMS Offline Computing model, enabling resource acquisition and exploitation by CMS data processing, simulation and analysis tasks. The SI is implemented as a set of federated HTCondor dynamic pools, which must therefore be adapted to ensure access and optimal usage of alternative processors and coprocessors such as GPUs. Resource provisioning and workload management tools and strategies in use by the CMS SI team must take into account questions such as the optimal level of granularity in the description of the resources and how to prioritize CMS diversity of workflows in relation to the new resource mix. Some steps in this evolution towards profiting from this higher resource heterogeneity have been already taken. For example, CMS is already opportunistically using a pool of GPU slots provided mainly at the CMS WLCG sites. Additionally, Power processors have been validated for CMS production at the Marconi100 cluster at CINECA. This contribution will describe the updated capabilities of the SI to continue ensuring the efficient allocation and use of computing resources by CMS, despite their increasing diversity. The next steps towards a full integration and support of heterogeneous resources according to CMS needs will also be reported.

Consider for long presentation No

Primary authors

Mascheroni, Marco (University of California San Diego) Mr Tsipinakis, Nikos (CERN) Mr Haleem, Saqib (National Centre for Physics, Islamabad Pakistan) Mrs Kizinevic, Edita (CERN) Mr Kim, Hyunwoo (FNAL) Mr Khan, Farrukh Aftab (FNAL) Mrs Acosta Flechas, María (FNAL) Dr Pérez-Calero Yzquierdo, Antonio (CIEMAT - PIC)

Presentation materials

Peer reviewing

Paper