Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Distributed Machine Learning with PanDA and iDDS in LHC ATLAS

May 8, 2023, 11:00 AM
15m
Marriott Ballroom II-III (Norfolk Waterside Marriott)

Marriott Ballroom II-III

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 4 - Distributed Computing Track 4 - Distributed Computing

Speaker

Weber, Christian (Brookhaven National Laboratory)

Description

Machine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these machine learning tasks. In addition, complex advanced machine learning workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex machine learning tasks has become a popular area. In this presentation, we will present our efforts on distributed machine learning in PanDA and iDDS (intelligent Data Delivery Service). We will at first address the difficulties to run machine learning tasks on distributed WLCG resources. Then we will present our implementation with DAG (Directed Acyclic Graph) and sliced parameters in iDDS to distribute machine learning tasks to distributed computing resources to execute them in parallel through PanDA. Next we will demonstrate some use cases we have implemented, such as Hyperparameter Optimization, Monte Carlo Toy confidence limits calculation and Active Learning. Finally we will describe some directions to perform in the future.

Consider for long presentation No

Primary authors

De, Kaushik (University of Texas at Arlington) Guan, Wen (Brookhaven National Laboratory) Karavakis, Edward (Brookhaven National Laboratory) Klimentov, Alexei (Brookhaven National Laboratory) Lin, Fa-Hui (University of Texas at Arlington) Maeno, Tadashi (Brookhaven National Laboratory (US)) Megino, Fernando Harald Barreiro (The University of Texas at Arlington) Nilsson, Paul (Brookhaven National Laboratory) Weber, Christian (Brookhaven National Laboratory) Wenaus, Torre (BNL) Yang, Zhaoyu (Brookhaven National Laboratory) Zhang, Rui Zhao, Xin (Brookhaven National Laboratory (US))

Presentation materials

Peer reviewing

Paper