Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

CernVM-FS at Extreme Scales

May 9, 2023, 11:30 AM
15m
Marriott Ballroom II-III (Norfolk Waterside Marriott)

Marriott Ballroom II-III

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 4 - Distributed Computing Track 4 - Distributed Computing

Speaker

Promberger, Laura (CERN)

Description

The CernVM File System (CVMFS) provides the software distribution backbone for High Energy and Nuclear Physics experiments and many other scientific communities in the form of a globally available shared software area. It has been designed for the software distribution problem of experiment software for LHC Runs 1 and 2. For LHC Run 3 and even more so for HL-LHC (Runs 4-6), the complexity of the experiment software stacks and their build pipelines is substantially larger. For instance, software is being distributed for several CPU architectures, often in the form of containers which includes base and operating system libraries, the number of external packages such as machine learning libraries has multiplied, and there is a shift from C++ to more Python-heavy software stacks that results in more and smaller files needing to be distributed. For CVMFS, the new software landscape means an order of magnitude increase of scale in key metrics such as number of files, number of system calls, and number of concurrent processes accessing the file system client. In this contribution, we report on the performance and reliability engineering on the file system client to sustain current and expected future software access load. Concretely, we show the impact of the newly designed file system cache management, including upstreamed improvements to the fuse kernel module itself, improved utilization of network links and caches (such as line optimization, prefetching, and proxy sharding), and operational improvements on network failure handling, error reporting, and integration with container runtimes. Overall, the new CVMFS client is designed to sustain applications with more than one million file lookups during startup, nodes with hundreds of cores, and thousands of concurrent processes accessing software from the file system client.

Consider for long presentation No

Primary authors

Presentation materials

Peer reviewing

Paper