PHYSLITE - a new reduced common data format for ATLAS

Schaarschmidt, Jana (University of Washington (US))


ATLAS is one of the main experiments at the Large Hadron Collider, with a diverse physics program covering precision measurements as well as new physics searches in countless final states, carried out by more than 2600 active authors. The High Luminosity LHC (HL-LHC) era brings unprecedented computing challenges that call for novel approaches to reduce the amount of data and MC that is stored, while continuing to support the rich physics program.
With the beginning of LHC Run 3, ATLAS introduced a new common data format, PHYS, that replaces most of the individual formats that were used in Run 2, and therefore reduces the disk storage significantly. ATLAS also launched the prototype of another common format, PHYSLITE, that is about a third of the size of PHYS. PHYSLITE will be the main format for the HL-LHC, and aims to serve 80% of all physics analyses. To simplify analysis workloads and further reduce disk usage it is designed to largely replace user-defined analysis n-tuples and consequently contains pre-calibrated objects. PHYSLITE is also intended to support “columnar” data processing techniques, which for some analyses may have significant advantages over the traditional event-loop analysis style. The evolution of data formats, the design principles for PHYSLITE, techniques for file size reductions, and various forms of validations will be discussed.

