Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Utilizing Distributed Heterogeneous Computing with PanDA in ATLAS

May 11, 2023, 12:00 PM
15m
Marriott Ballroom II-III (Norfolk Waterside Marriott)

Marriott Ballroom II-III

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 4 - Distributed Computing Track 4 - Distributed Computing

Speaker

Barreiro Megino, Fernando (Unive)

Description

In recent years, advanced and complex analysis workflows have gained increasing importance in the ATLAS experiment at CERN, one of the large scientific experiments at the Large Hadron Collider (LHC). Support for such workflows has allowed users to exploit remote computing resources and service providers distributed worldwide, overcoming limitations on local resources and services. The spectrum of computing options keeps increasing across WLCG resources, volunteer computing, high-performance and leadership computing facilities, commercial clouds, and emerging service levels like Platform-as-a-Service (PaaS), Container-as-a-Service (CaaS) and Function-as-a-Service (FaaS), each one providing new advantages and constraints. Users can significantly benefit from these providers, but at the same time, it is cumbersome to deal with multiple providers even in a single analysis workflow with fine-grained requirements coming from their applications' nature and characteristics.
In this presentation we will first highlight issues in distributed heterogeneous computing, such as the insulation of users from the complexities of distributed heterogeneous providers, complex resource provisioning for CPU and GPU hybrid applications, integration of PaaS, CaaS, and FaaS providers, smart workload routing, automatic data placement, seamless execution of complex workflows, interoperability between pledged and user resources, and on-demand data production. We will then present solutions developed in ATLAS with the Production and Distributed Analysis system (PanDA system) and future challenges for LHC Run4.

Consider for long presentation No

Primary authors

Maeno, Tadashi (BNL) Barreiro Megino, Fernando (Unive) De, Kaushik (University of Texas at Arlington) Guan, Wen (Brookhaven National Laboratory) Karavakis, Edward (Brookhaven National Laboratory) Klimentov, Alexei (Brookhaven National Laboratory) Nilsson, Paul (Brookhaven National Laboratory) Wenaus, Torre (BNL) Yang, Zhaoyu (Brookhaven National Laboratory) Zhao, Xin (Brookhaven BNL) Lin, Fa-Hui (Taipei AS)

Presentation materials

Peer reviewing

Paper