ALICE, one of the four large experiments at CERN LHC, is a detector for the physics of heavy ions. In a high interaction rate environment, the pile-up of multiple events leads to an environment that requires advanced multidimensional data analysis methods.
Machine learning (ML) has become very popular in multidimensional data analysis in recent years. Compared to the simple, low-dimensional analytical approaches used in the past, it is more difficult to interpret machine learning models and evaluate their uncertainties. On the other hand, oversimplification and reduction of dimensionality in the analysis lead to explanations becoming more complex or wrong.
Our goal was to provide a tool for dealing with multidimensional problems, to simplify data analysis in many (optimally all relevant) dimensions, to fit and visualize multidimensional functions including their uncertainties and biases, to validate assumptions and approximations, to easy define the functional composition of analytical parametric and non-parametric functions, to use symmetries and to define multidimensional "invariant" functions/alarms.
Using a combination of lossy and lossless data compression, datasets with, for example, O(10^7) entries times O(25) attributes can be analyzed interactively in the standalone application in the O(500 MBy) browser. By applying a suitable representative downsampling O(10^-2-10^-3) and subsequent reweighting or pre-aggregation on the server or bach farm, the effective monthly/annual statistics ALICE can be analyzed interactively in many dimensions for calibration/reconstruction validation/QA/QC or statistical/physical analysis.
In this contribution, we introduce the main features of our general-purpose statistical tool and demonstrate them with examples from ALICE, used in the development of simulations/calibrations/reconstructions for combined particle identification, the spatial point distortion algorithm and the combined multiplicity-centrality estimators.
|Consider for long presentation||Yes|