Please visit Jefferson Lab Event Policies and Guidance before planning your next event:
May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Benchmarking distributed-RDataFrame with CMS analysis workflows on the INFN analysis infrastructure

May 9, 2023, 4:30 PM
Hampton Roads Ballroom VIII (Norfolk Waterside Marriott)

Hampton Roads Ballroom VIII

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 6 - Physics Analysis Tools Track 6 - Physics Analysis Tools


spiga, daniele (INFN)


The challenges expected for the HL-LHC era are pushing LHC experiments to re-think their computing models at many levels. The evolution toward solutions that allow an effortless interactive analysis experience is, among others, one of the topics followed closely by the CMS experiment. In this context, ROOT RDataFrame offers a high-level, lazy programming model which makes it a flexible and user-friendly tool for HEP analysis workflows. To support this paradigm shift even further, a distributed infrastructure which leverages Dask to offload interactive payloads has been set up in production on INFN resources, transparently integrating Grid, clouds and possibly HPC. It was then a natural fit to integrate the efforts on both solutions to get a peek on how a Phase2 analysis might look like. The presented work will provide an overview of the main technologies involved and will describe the results of the first benchmark using the analysis of Vector Boson Scattering (VBS) of same-sign W boson pairs processes with one hadronically-decaying tau lepton and one light lepton (electron or muon) in the final state .The analysis workflow includes systematic variations as well as pre- and post-selection phases. The proposed comparison between a “legacy” batch-based strategy and the interactive RDataframe is based on several metrics from event throughput to resource consumption. To achieve a fair comparison both cases have been executed running the same analysis on the very same set of resources hosted at the INFN distributed analysis facility.

Consider for long presentation No

Primary authors

spiga, daniele (INFN) Ciangottini, Diego (INFN Perugia) Mr tedeschi, tommaso (INFN & Università degli studi di Perugia) Dr Padulano, Vincenzo Eduardo (CERN) Dr Tejedor Saavedra, Enric (CERN) Mr biasotto, massimo (infn) Dr Guiraud, Enrico (CERN)

Presentation materials