Please visit Jefferson Lab Event Policies and Guidance before planning your next event:
May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Automatising open data publishing workflows: experience with CMS open data curation

May 9, 2023, 2:45 PM
Marriott Ballroom I (Norfolk Waterside Marriott)

Marriott Ballroom I

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 8 - Collaboration, Reinterpretation, Outreach and Education Track 8 - Collaboration, Reinterpretation, Outreach and Education


Dr Simko, Tibor (CERN)


In this paper we discuss the CMS open data publishing workflows, summarising experience with eight releases of CMS open data on the CERN Open Data portal since its initial launch in 2014. We present the recent enhancements of data curation procedures, including (i) mining information about collision and simulated datasets with accompanying generation parameters and processing configuration files, (ii) building an API service covering information related to luminosity, run number ranges and other contextual dataset information, as well as (iii) configuring the CERN Open Data storage area as a Rucio endpoint that manages over four petabytes of released CMS open data and serves as a WLCG Tier 3 site to simplify data transfers. Finally, we discuss the latest CMS content released as open data (completed Run 1 data, first samples from Run 2 data) and the associated runnable analysis examples demonstrating its use in containerised data analysis workflows. We conclude by a short list of lessons learnt as well as general recommendations to facilitate upcoming releases of Run 2 data.

Consider for long presentation No

Primary authors

Dr Lassila-Perini, Kati (Helsinki Institute of Physics) Dr Simko, Tibor (CERN)

Presentation materials