Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Optimizing ATLAS data storage: the impact of compression algorithms on ATLAS physics analysis data formats

May 11, 2023, 2:15 PM
15m
Hampton Roads Ballroom VI (Norfolk Waterside Marriott)

Hampton Roads Ballroom VI

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 3 - Offline Computing Track 3 - Offline Computing

Speaker

Dr Marcon, Caterina (INFN Milano)

Description

The increased footprint foreseen for Run-3 and HL-LHC data will soon expose
the limits of currently available storage and CPU resources. Data formats
are already optimized according to the processing chain for which they are
designed. ATLAS events are stored in ROOT-based reconstruction output files
called Analysis Object Data (AOD), which are then processed within the
derivation framework to produce Derived AOD (DAOD) files.

Numerous DAOD formats, tailored for specific physics and performance groups,
have been in use throughout the ATLAS Run-2 phase. In view of Run-3, ATLAS
has changed its Analysis Model, which entailed a significant reduction of
the existing DAOD flavors. Two new, unfiltered and skimmable on read,
formats have been proposed as replacements: DAOD_PHYS, designed to meet the
requirements of the majority of the analysis workflows, and DAOD_PHYSLITE, a
smaller format containing already calibrated physics objects. As ROOT-based
formats, they natively support four lossless compression algorithms: Lzma,
Lz4, Zlib and Zstd.

In this study, the effects of different compression settings on file size,
compression time, compression factor and reading speed are investigated
considering both DAOD_PHYS and DAOD_PHYSLITE formats. Total as well as
partial event reading strategies have been tested. Moreover, the impact of
AutoFlush and SplitLevel, two parameters controlling how in-memory data
structures are serialized to ROOT files, has been evaluated.

This study yields quantitative results that can serve as a paradigm on how
to make compression decisions for different ATLAS' use cases. As an example,
for both DAOD_PHYS and DAOD_PHYSLITE, the Lz4 library exhibits the fastest
reading speed, but results in the largest files, whereas the Lzma algorithm
provides larger compression factors at the cost of significantly slower
reading speeds. In addition, guidelines for setting appropriate AutoFlush
and SplitLevel values are outlined.

Consider for long presentation No

Primary authors

Dr Marcon, Caterina (INFN Milano) Dr Carminati , Leonardo (INFN Milano) van Gemmeren, Peter (Argonne National Laboratory) Mete, Alaettin Serhan (Argonne National Laboratory (US))

Presentation materials

Peer reviewing

Paper