Neural Networks (NN) are often trained offline on large datasets and deployed on specialized hardware for inference, with a strict separation between training and inference. However, in many realistic applications the training environment differs from the real world or data arrive in a streaming fashion and are continuously changing. In these scenarios, the ability to continuously train and update NN models is desirable.
Continual learning (CL) algorithms allow training of models over a stream of data. CL algorithms are often designed to work in constrained settings, such as limited memory and computational power, or limitations on the ability to store past data (e.g., due to privacy concerns or memory requirements). The most basic online learning suffers from “catastrophic forgetting”, where knowledge from initial or previous training is lost. CL aims to mitigate this effect through the use of different learning algorithms.
High-energy physics experiments are developing intelligent detectors, with algorithms running on computer systems located close to the detector to meet the challenges of increased data rates and occupancies. The use of NN algorithms in this context is limited by changing detector conditions, such as degradation over time or failure of an input signal which might cause the NNs to lose accuracy leading, in the worst case, to loss of interesting events.
CL has the potential to solve this issue, using large amounts of continuously streaming data to allow the network to recognize changes to learn and adapt to detector conditions. It has the potential to outperform traditional NN training techniques as not all possible scenarios can be predicted and modeled in static training data samples.
However, NN training is computationally expensive and when combined with the strict timing requirements of embedded processors deployed close to the detector, current state-of-the-art offline approaches cannot be directly applied in real-time systems. Alternatives to typical backpropagation-based training that can be deployed on FPGAs for real-time data processing are presented, and their computational and accuracy characteristics are discussed in the context of HL-LHC.
|Consider for long presentation||No|