Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

Indico is being upgraded to version 3.3.4 on October 15, 2024. There are no impacts to events expected. There are no major feature updates – only minor feature improvements and bugfixes. See the news link for more information.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Accelerating science: the usage of commercial clouds in ATLAS distributed computing

May 9, 2023, 3:00 PM
15m
Marriott Ballroom IV (Norfolk Waterside Marriott)

Marriott Ballroom IV

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 7 - Facilities and Virtualization Track 7 - Facilities and Virtualization

Speaker

Megino, Fernando Harald Barreiro (The University of Texas at Arlington)

Description

The ATLAS experiment at CERN is one of the largest scientific machines built to date and will have ever growing computing needs as the Large Hadron Collider collects an increasingly larger volume of data over the next 20 years. ATLAS is conducting R&D projects on Amazon and Google clouds as complementary resources for distributed computing, focusing on some of the key features of commercial clouds: lightweight operation, elasticity and availability of multiple chip architectures.

The proof of concept phases have concluded with the cloud-native, vendor-agnostic integration with the experiment’s data and workload management frameworks. Google has been used to evaluate elastic batch computing, ramping up ephemeral clusters of up to O(100k) cores to process tasks requiring quick turnaround. Amazon cloud has been exploited for the successful physics validation of the Athena simulation software on ARM processors.
We have also set up an interactive facility for physics analysis allowing end-users to spin up private, on-demand clusters for parallel computing with up to 4000 cores, or run GPU enabled notebooks and jobs for machine learning applications.

The success of the proof of concept phases has led to the extension of the Google cloud project, where ATLAS will study the total cost of ownership of a production cloud site during 15 months with 10k cores on average, fully integrated with distributed grid computing resources and continue the R&D projects.

Consider for long presentation Yes

Primary authors

Megino, Fernando Harald Barreiro (The University of Texas at Arlington) Borodin, Mikhail (University of Iowa) De, Kaushik (The University of Texas at Arlington) Elmsheuser, Johannes (Brookhaven National Laboratory) Di Girolamo, Alessandro (CERN) Hartmann, Nikolai (Fakultaet fuer Physik, Ludwig-Maximilians-Universitaet Muenchen) Heinrich, Lukas (Max-Planck-Institut f\"ur Physik) Klimentov, Alexei (Brookhaven National Laboratory) Lassnig, Mario (CERN) Lin, FaHui (University of Texas at Arlington) Maeno, Tadashi (Brookhaven National Laboratory) Marshall, Zachary (Lawrence Berkeley National Laboratory) Merino, Gonzalo (Port d’Informació Científica) Serfon, Cedric (Brookhaven National Laboratory) South, David (DESY) Bawa, Harinder Singh (California State University) Sandesara, Jay (University of Massachusetts, Amherst) Nilsson, Paul (Brookhaven National Laboratory)

Presentation materials

Peer reviewing

Paper