During the long shutdown between LHC Run 2 and 3, a reprocessing of 2017 and 2018 CMS data with higher granularity data quality monitoring (DQM) harvesting was done. The time granularity of DQM histograms in this dataset is increased by 3 orders of magnitude. In anticipation of deploying this higher granularity DQM harvesting in the ongoing Run 3 data taking, this dataset is used to study the application of Machine Learning (ML) techniques to data certification with the goal of developing tools for online monitoring and offline certification. In this talk, we will discuss the challenges and present some of the results, illustrating the tools developed for CMS Tracker Data Quality Monitoring and Certification. Studies consider both the use case of anomaly detection in the context of reprocessing campaigns, when all the data is available, and in the context of continuous data taking, when conditions are constantly changing and models need to be trained on data previously collected with similar conditions. Data augmentation is pursued, including information from the CMS Online Monitoring System (luminosity, pile-up, LHC fill, run and trigger), from the CMS Run Registry (sub-detector certification flags), from the CMS conditions database (calibrations). The status of the web application integrating data sources and facilitating development, testing and benchmarking of ML models will be presented using a few test cases.
|Consider for long presentation||Yes|