Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

EJFAT: Accelerated Intelligent Compute Destination Load Balancing

May 11, 2023, 11:15 AM
15m
Marriott Ballroom VII (Norfolk Waterside Marriott)

Marriott Ballroom VII

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Speaker

Goodrich, Michael

Description

To increase the science rate for high data rates/volumes, Thomas Jefferson National Accelerator Facility (JLab) has partnered with Energy Sciences Network (ESnet) to define an edge to data center traffic shaping/steering transport capability featuring data event-aware network shaping and forwarding.

The keystone of this ESnet JLab FPGA Accelerated Transport (EJFAT) is the joint development of a dynamic compute work Load Balancer (LB) of UDP streamed data. The LB's centerpiece is a Field Programmable Gate Array (FPGA). The FPGA executes a dynamically configurable, low fixed latency, LB data plane featuring real-time packet redirection at high throughput. It also executes a control plane running on its host computer that monitors network and compute farm telemetry in order to make dynamic AI/ML guided decisions. These decisions include determining destination compute host redirection / load balancing.

The LB provides for three forms of scaling. It provides horizontal scale by adding more FPGAs for increased bandwidth. Second it sets the number of core compute hosts independent of the number of source DAQs. Thirdly it allows for a flexible number of CPUs and threads per host, treating each receiving thread as an independent LB destination. The LB provides seamless integration of edge / core computing to support direct experimental data processing.Immediate use will be in JLab science programs and others such as the EIC (Electron Ion Collider). Data centers of the future will need high throughput and low latency for both live streamed and recorded data for running experiment data acquisition analysis and data center use cases.

EJFAT is in development for production use within DOE. When completed, it will have an operational impact for integrated research infrastructure as called for in planning documents for Exascale, Nuclear Physics, and Scientific Computing. It demonstrates a new load balancing architecture.

Consider for long presentation Yes

Primary authors

Goodrich, Michael Timmer, Carl (Thomas Jefferson National Accelerator Facility) Gyurjyan, Vardan (Jefferson Lab) Lawrence, David (Jefferson Lab) Heyes, Graham (Jefferson Lab) Kumar, Yatish (ESnet) Sheldon, Stacey (ESnet)

Presentation materials

Peer reviewing

Paper