Speaker
Description
The deployment of Machine Learning (ML) applications in a production environment requires verification, validation, assurance, and trust. ML models are notoriously difficult to maintain in these environments where data and systems may evolve over time and long-term maintenance is required. The models require active management for (1) reproduction or replication of model weights, (2) monitoring data drift, (3) tracking model performance, and (4) updating models. A Machine Learning Operations (MLOps) framework that will ensure a sustainable develop-deploy-monitor paradigm for accelerator control systems will be presented along with an overview of R&D to enable ML capabilities for accelerator operations. The R&D is being initiated by the Accelerator Controls Operations Research Network (ACORN) DOE O413.3b project to modernize Fermilab’s accelerator control system in preparation for operations with megawatt particle beams.