26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Name: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)
Start: 2023-05-08T08:00:00-04:00
End: 2023-05-12T16:00:00-04:00
Location: Norfolk Waterside Marriott

May 8 – 12, 2023

Norfolk Waterside Marriott

US/Eastern timezone

Conference Secretariat

chep2023-secretariat@jlab.org

Automatising open data publishing workflows: experience with CMS open data curation

May 9, 2023, 2:45 PM

15m

Marriott Ballroom I (Norfolk Waterside Marriott)

Marriott Ballroom I

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Oral Track 8 - Collaboration, Reinterpretation, Outreach and Education Track 8 - Collaboration, Reinterpretation, Outreach and Education

Dr Simko, Tibor (CERN)

In this paper we discuss the CMS open data publishing workflows, summarising experience with eight releases of CMS open data on the CERN Open Data portal since its initial launch in 2014. We present the recent enhancements of data curation procedures, including (i) mining information about collision and simulated datasets with accompanying generation parameters and processing configuration files, (ii) building an API service covering information related to luminosity, run number ranges and other contextual dataset information, as well as (iii) configuring the CERN Open Data storage area as a Rucio endpoint that manages over four petabytes of released CMS open data and serves as a WLCG Tier 3 site to simplify data transfers. Finally, we discuss the latest CMS content released as open data (completed Run 1 data, first samples from Run 2 data) and the associated runnable analysis examples demonstrating its use in containerised data analysis workflows. We conclude by a short list of lessons learnt as well as general recommendations to facilitate upcoming releases of Run 2 data.

Consider for long presentation	No

Dr Lassila-Perini, Kati (Helsinki Institute of Physics) Dr Simko, Tibor (CERN)

chep2023-opendata-cms-slides.pdf

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Automatising open data publishing workflows: experience with CMS open data curation

Marriott Ballroom I

Norfolk Waterside Marriott

Speaker

Description

Authors

Presentation materials

Choose timezone

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Speaker

Description

Authors

Presentation materials