26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Name: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)
Start: 2023-05-08T08:00:00-04:00
End: 2023-05-12T16:00:00-04:00
Location: Norfolk Waterside Marriott

May 8 – 12, 2023

Norfolk Waterside Marriott

US/Eastern timezone

Conference Secretariat

chep2023-secretariat@jlab.org

Analysis Productions: A declarative approach to ntupling

May 9, 2023, 5:45 PM

15m

Hampton Roads Ballroom VIII (Norfolk Waterside Marriott)

Hampton Roads Ballroom VIII

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Oral Track 6 - Physics Analysis Tools Track 6 - Physics Analysis Tools

Burr, Chris (CERN)

Most analyses in the LHCb experiment start by filtering data and simulation stored on the WLCG. Traditionally this has been achieved by submitting user jobs that each process a small fraction of the total dataset. While this has worked well, it has become increasingly complex as the LHCb datasets have grown and this model requires all analysts to understand the intricacies of the grid. This model also burdens individuals with needing to document the way in which each file was processed.

Here we present a more robust and efficient approach, known within LHCb as Analysis Productions. Filtering LHCb datasets to create ntuples is done by creating a merge request in GitLab, which is then tested automatically on a small subset of the data using Continuous Integration. Results of these tests are exposed via a dedicated website that aggregates the most important details. Once the merge request is reviewed and accepted, productions are submitted and run automatically using the power of the DIRAC transformation system. The output data is stored on grid storage and tools are provided to make it easily accessible for analysis.

This new approach has the advantage of being faster and simpler for analysts while also ensuring that the full processing chain is preserved and reproducible. Using GitLab to manage submissions encourages code review and the sharing of derived datasets between analyses.

The Analysis Productions system has been stress-tested with legacy data for a couple of years and is becoming the de facto standard by which data, legacy or run-3, is prepared for physics analysis. It has been scaled to analyses that process thousands of datasets and the approach of testing prior to submission is now being expanded to other production types in LHCb.

Consider for long presentation	Yes

Neubert, Sebastian (HISKP Bonn) Burr, Chris (CERN)

2023-05-09_Analysis_Productions_CHEP_2023.pdf

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Analysis Productions: A declarative approach to ntupling

Hampton Roads Ballroom VIII

Norfolk Waterside Marriott

Speaker

Description

Authors

Presentation materials

Choose timezone

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Speaker

Description

Authors

Presentation materials