Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Analyzing, Identifying and Alerting on Network Issues

Not scheduled
1h
Hampton Roads Ballroom and Foyer Area (Norfolk Waterside Marriott)

Hampton Roads Ballroom and Foyer Area

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Poster Poster Poster Session

Speaker

McKee, Shawn (University of Michigan Physics)

Description

WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The OSG Networking Area, in partnership with the WLCG Throughput working group, has created a monitoring infrastructure that gathers metrics from the global WLCG perfSONAR deployment and centrally stores the measurements in Elasticsearch.

In this presentation we will describe the ongoing work to proactively analyze, correlate and alert on various network and infrastructure issues. We will discuss the applied methods and techniques, the developed systems, as well as the challenges with the measurements that make it difficult to easily identify problems or to assign those problems to the appropriate location(s). Lastly we will describe our future plans to incorporate AI/ML by appropriately annotating data based upon the types of issues discovered.

Consider for long presentation No

Primary authors

Vasileva, Petya (University of Michigan) Babik, Marian (CERN) McKee, Shawn (University of Michigan Physics) Dr Vukotic, Ilija (University of Chicago)

Presentation materials

Peer reviewing

Paper