ALICE (A Large Ion Collider Experiment) has undertaken a major upgrade during the Long Shutdown 2. The increase in the detector data rates, and in particular the continuous readout of the TPC, led to a hundredfold increase in the input raw data, up to 3.5 TB/s. To cope with it, a new common Online and Offline computing system, called O2, has been developed and put in production.
The online Data Quality Monitoring (DQM) and the offline Quality Assurance (QA) are critical aspects of the data acquisition and reconstruction software chains. The former intends to provide shifters with precise and complete information to quickly identify and overcome problems while the latter aims at selecting good quality data for physics analyses. Both DQM and QA typically involve the gathering of data, its distributed analysis by user-defined algorithms, the merging of the resulting objects and their visualization.
This paper discusses the final architecture and design of the QC, which runs synchronously to the data taking and asynchronously on the Worldwide LHC Computing Grid. Following the successful first year of data taking with beam, we will present our experience and the lessons we learned, before and after the LHC restart, when monitoring the data quality in a real-world and challenging environment. We will finally illustrate the wide range of usages people make of this system by presenting a few, carefully picked, use cases.
|Consider for long presentation||No|