High Energy Physics experiments at the Large Hadron Collider generate petabytes of data that go though multiple transformation before final analysis and paper publication. Recording the provenance of these data is therefore crucial to maintain the quality of the final results. While the tools are in place within LHCb to keep this information for the common experiment-wide transforms, analysts have to implement solutions themselves for the steps dealing with ntuples. The gap between centralised and interactive processing can become problematic. In order to facilitate the task, ntuples extracted by LHCb analysts via so-called “Analysis Productions” are tracked in the experiment bookkeeping database and can be enriched with extra information about their meaning and intended use. This information can then be used to access the data more easily: a set of Python tools allow locating the files based on their metadata and integrating their processing within analysis workflows. The tools are designed with the intention of ensuring analysis code continues to be functional into the future and are robust against evolutions in how data is accessed. This paper presents the integration of these new tools within the LHCb codebase and demonstrates how they will be used in LHCb data processing and analysis.
|Consider for long presentation||No|