Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

HPC resources for CMS offline computing: an integration and scalability challenge for the Submission Infrastructure

Not scheduled
1h
Hampton Roads Ballroom and Foyer Area (Norfolk Waterside Marriott)

Hampton Roads Ballroom and Foyer Area

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Poster Poster Poster Session

Speaker

Dr Pérez-Calero Yzquierdo, Antonio (CIEMAT - PIC)

Description

The computing resource needs of LHC experiments, such as CMS, are expected to continue growing significantly over the next decade, during the Run 3 and especially the HL-LHC era. Additionally, the landscape of available resources will evolve, as HPC (and Cloud) resources will provide a comparable, or even dominant, fraction of the total capacity, in contrast with the current situation, dominated by WLCG sites contributions. The future years present therefore a challenge for the experiments’ resource provisioning and allocation models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) is the main computing resource provisioning system for offline CMS workflows, including data processing, simulation and analysis. The SI team manages a set of federated HTCondor pools, currently aggregating around 400k CPU cores distributed worldwide, supporting the simultaneous execution of over 200k CMS computing tasks. Incorporating HPC resources into CMS offline computing firstly represents an integration challenge, as HPC centers are much more diverse than traditional Grid resources in their technical capabilities and limitations. Secondly, while the present infrastructure is sufficient to harness the current computing power, maintaining global flexibility and efficiency of use at the resource scales required for the HL-LHC phase may represent a trial for CMS SI as well. For this reason, in order to preventively detect and overcome performance degradation driven by scalability barriers, the SI team regularly runs tests to explore the scalability reach of our infrastructure. In this contribution, the diverse methods by which HPC resources are being integrated into CMS offline infrastructure will be described, providing a number of already successful cases, for example HPC centers joining the CMS SI as transparent extensions of already existing WLCG sites. Additionally, we will report on the test results for potential scalability limitations of our infrastructure.

Consider for long presentation No

Primary authors

Mascheroni, Marco (University of California San Diego) Mr Tsipinakis, Nikos (CERN) Mrs Kizinevic, Edita (CERN) Mr Haleem, Saqib (National Centre for Physics, Islamabad Pakistan) Mr Kim, Hyunwoo (FNAL) Mr Khan, Farrukh Aftab (FNAL) Mrs Acosta Flechas, María (FNAL) Hufnagel, Dirk (Fermilab) Dr Pérez-Calero Yzquierdo, Antonio (CIEMAT - PIC)

Presentation materials

Peer reviewing

Paper