Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Using ML clustering tools to improve data transfer management operations

Not scheduled
1h
Hampton Roads Ballroom and Foyer Area (Norfolk Waterside Marriott)

Hampton Roads Ballroom and Foyer Area

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Poster Poster Poster Session

Speaker

Rinaldi, Lorenzo

Description

The GRID computing paradigms adopted by the main HEP experiments is based on the distribution of experimental data on computer resources located all over the world. In general, the data distribution operation is managed centrally by services, such as the CERN File Transfer Service (FTS), which interact with the local Storage Elements. While performing bulk data transfers at such large scale, various issues may occur and a network of experts on duty is required to follow up on problematic data transfers.
On the other hand, all the data transfer processes are tracked in log files produced by the various services involved and such log files represent a source of information which is largely underutilized.
In this paper we present an approach based on unsupervised ML techniques used to automatically process information stored in log files.
Two different clustering algorithms, K-means and DBSCAN, were used on two different logfile datasets, provided by FTS at CERN side and by the StoRM service at the INFN-CNAF Tier-1 data center, with the aim of grouping the error messages and providing operators a tool to speed up the procedures for detecting errors and solving problems.

Consider for long presentation No

Primary authors

Clissa, Luca (Bologna University and INFN) Dr Morganti, Lucia (INFN-CNAF) Rinaldi, Lorenzo

Presentation materials