Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Repurposing of the Run 2 CMS High Level Trigger Infrastructure as an Cloud Resource for Offline Computing

May 9, 2023, 2:30 PM
15m
Hampton Roads Ballroom VI (Norfolk Waterside Marriott)

Hampton Roads Ballroom VI

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 3 - Offline Computing Track 3 - Offline Computing

Speaker

Mascheroni, Marco (University of California San Diego)

Description

The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 30k job slots for offline computing. The role of this farm has been evolving, from an opportunistic resource exploited during inter-fill periods in the LHC Run 2, to a nearly transparent extension of the CMS capacity at CERN during LS2 and into the LHC Run 3 started in 2022. This “permanent cloud” is located on-site at the LHC interaction point 5, where the CMS detector is installed. As a critical example, the execution of Tier 0 tasks, such as prompt detector data reconstruction, has been fully commissioned. This resource can therefore be used in combination with the dedicated Tier 0 capacity at CERN, in order to process and absorb peaks in the stream of data coming from the CMS detector, as well as contributing to the prompt reconstruction of a substantial fraction of the “parked data sample”, dedicated primarily to B physics studies. The initial deployment model for this resource, based on long-lived statically configured VMs, including HTCondor execution node services connected to the CMS Submission Infrastructure (SI), provided the required level of functionality to enable its exploitation for offline computing. However, this configuration presented certain limitations in its flexibility of use in comparison to pilot-based resource acquisition at the WLCG sites. For example, slot defragmentation techniques were required to enable matching of Tier 0 multicore jobs. Additionally, the configuration of fair-share quotas and priorities for the diverse CMS tasks could not be directly managed by the CMS SI team, in charge of enforcing the global CMS resource provisioning and exploitation policies. A new configuration of this permanent cloud has been proposed in order to solve these shortcomings. A vacuum-like model, based on GlideinWMS pilot jobs joining the CMS CERN HTCondor Pool has been prototyped and successfully tested and deployed. This contribution will describe this redeployment work on the permanent cloud for an enhanced support to CMS offline computing, comparing the former and new models’ respective functionalities, along with the commissioning effort for the new setup.

Consider for long presentation No

Primary authors

Mascheroni, Marco (University of California San Diego) Pérez-Calero Yzquierdo, Antonio (CIEMAT - PIC) Mr Haleem, Saqib (National Centre for Physics, Islamabad Pakistan) Mr Tsipinakis, Nikos (CERN) Mrs Kizinevic, Edita (CERN) Mr Khan, Farrukh Aftab (FNAL) Mr Kim, Hyunwoo (FNAL)

Presentation materials

Peer reviewing

Paper