Speaker
Description
The GRID computing paradigms adopted by the main HEP experiments is based on the distribution of experimental data on computer resources located all over the world. In general, the data distribution operation is managed centrally by services, such as the CERN File Transfer Service (FTS), which interact with the local Storage Elements. While performing bulk data transfers at such large scale, various issues may occur and a network of experts on duty is required to follow up on problematic data transfers.
On the other hand, all the data transfer processes are tracked in log files produced by the various services involved and such log files represent a source of information which is largely underutilized.
In this paper we present an approach based on unsupervised ML techniques used to automatically process information stored in log files.
Two different clustering algorithms, K-means and DBSCAN, were used on two different logfile datasets, provided by FTS at CERN side and by the StoRM service at the INFN-CNAF Tier-1 data center, with the aim of grouping the error messages and providing operators a tool to speed up the procedures for detecting errors and solving problems.
Consider for long presentation | No |
---|