Machine learning has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these machine learning tasks. In addition, complex advanced machine learning workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex machine learning tasks has become a popular area. In this presentation, we will present our efforts on distributed machine learning in PanDA and iDDS (intelligent Data Delivery Service). We will at first address the difficulties to run machine learning tasks on distributed WLCG resources. Then we will present our implementation with DAG (Directed Acyclic Graph) and sliced parameters in iDDS to distribute machine learning tasks to distributed computing resources to execute them in parallel through PanDA. Next we will demonstrate some use cases we have implemented, such as Hyperparameter Optimization, Monte Carlo Toy confidence limits calculation and Active Learning. Finally we will describe some directions to perform in the future.
|Consider for long presentation||No|