There is increasing demand for the efficiency and flexibility of data transport systems supporting data-intensive sciences. With growing data volume, it is essential that the transport system of a data-intensive science project fully utilize all available transport resources (e.g., network bandwidth); to achieve statistical multiplexing gain, there is an increasing trend that multiple projects share the same transport infrastructure, but the wide deployment of a shared infrastructure requires flexible resource control. In this talk, we first conduct a rigorous analysis of existing data transport systems and show that considering the infrastructures as a black box can limit efficiency and flexibility. We then introduce ALTO/TCN, a new architecture that introduces deep infrastructure visibility to achieve efficient, flexible data transport. We will provide additional details on 3 key components to realize the architecture: (1) how to achieve infrastructure visibility in multi-domain networks, using the Internet Engineering Task Force (IETF) Application-Layer Traffic Optimization (ALTO) protocol and the openalto.org visibility orchestrator; (2) how to integrate visibility into transport scheduling optimization, with zero-orde/first-order gradient and time-multiplexing control, using FTS integration as an example; and (3) how to integrate visibility into data selection orchestration, with general distances as a visibility abstraction, using Rucio integration as an example. We will report evaluation results and implementation lessons. We conclude with planning for the next steps, in particular, how the project complements existing related efforts in HEP, such as application awareness (e.g., packet marking) and adaptive networking resource allocation (e.g., NOTED/SENSE/AutoGOLE).
|Consider for long presentation||Yes|