26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Name: 26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)
Start: 2023-05-08T08:00:00-04:00
End: 2023-05-12T16:00:00-04:00
Location: Norfolk Waterside Marriott

May 8 – 12, 2023

Norfolk Waterside Marriott

US/Eastern timezone

Conference Secretariat

chep2023-secretariat@jlab.org

Session

Track 1 - Data and Metadata Organization, Management and Access

May 8, 2023, 11:00 AM

Norfolk Ballroom III-V (Norfolk Waterside Marriott)

Norfolk Ballroom III-V

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510

Track 1 - Data and Metadata Organization, Management and Access: Storage

Mario Lassnig (CERN)
Martin Barisits (CERN)

Track 1 - Data and Metadata Organization, Management and Access: Networks

Diego Davila (University of California, San Diego)
Mario Lassnig (CERN)

Track 1 - Data and Metadata Organization, Management and Access: Clouds & Caches

Michael Kirby (FNAL)
Martin Barisits (CERN)

Track 1 - Data and Metadata Organization, Management and Access: Tapes

Diego Davila (University of California, San Diego)
Mario Lassnig (CERN)

Track 1 - Data and Metadata Organization, Management and Access: Databases & Metadata

Mario Lassnig (CERN)
Martin Barisits (CERN)

Track 1 - Data and Metadata Organization, Management and Access: Data Management

Michael Kirby (FNAL)
Martin Barisits (CERN)

Track 1 - Data and Metadata Organization, Management and Access: Analytics & Benchmarks

Diego Davila (University of California, San Diego)
Michael Kirby (FNAL)

There are no materials yet.

25. dCache: storage system for data intensive science

Mr Litvintsev, Dmitry (Fermilab)

5/8/23, 11:00 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The dCache project provides open-source software deployed internationally to satisfy
ever more demanding storage requirements. Its multifaceted approach provides an integrated
way of supporting different use-cases with the same storage, from high throughput data
ingest, data sharing over wide area networks, efficient access from HPC clusters and long
term data persistence on a tertiary...

105. Erasure Coding Xrootd Object Store

Yang, Wei (SLAC National Accelerator Laboratory)

5/8/23, 11:15 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

XRootD implemented a client-side erasure coding (EC) algorithm utilizing the Intel Intelligent Storage Acceleration Library. At SLAC, a prototype of XRootD EC storage was set up for evaluation. The architecture and configuration of the prototype is almost identical to that of a traditional non-EC XRootD storage behind a firewall: a backend XRootD storage cluster in its simplest form, and an...

433. POSIX access to remote storage with OpenID Connect AuthN/AuthZ

Dr Fornari, Federico (INFN-CNAF)

5/8/23, 11:30 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

INFN-CNAF is one of the Worldwide LHC Computing Grid (WLCG) Tier-1 data centers, providing support in terms of computing, networking, storage resources and services also to a wide variety of scientific collaborations, ranging from physics to bioinformatics and industrial engineering.
Recently, several collaborations working with our data center have developed computing and data management...

276. Enabling Storage Business Continuity and Disaster Recovery with Ceph distributed storage

Bocchi, Enrico (CERN)

5/8/23, 11:45 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The Storage Group in the CERN IT Department operates several Ceph storage clusters with an overall capacity exceeding 100 PB. Ceph is a crucial component of the infrastructure delivering IT services to all the users of the Organization as it provides: i) Block storage for the OpenStack infrastructure, ii) CephFS used as persistent storage by containers (OpenShift and Kubernetes) and as shared...

606. ECHO: optimising object store data access for Run-3 and the evolution towards HL-LHC

Walder, James (RAL, STFC, UKRI)

5/8/23, 12:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

Data access at the UK Tier-1 facility at RAL is provided through its ECHO storage, serving the requirements for the WLGC and increasing numbers of other HEP and astronomy related communities.
ECHO is a Ceph-backed erasure-coded object store, currently providing in excess of 40PB of usable space, with frontend access to data provided via XRootD or gridFTP, using the libradosstriper library of...

209. EOS Software Evolution Enabling LHC Run-3

Mr Caffy, Cedric (CERN)

5/8/23, 12:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

EOS has been the main storage system at CERN for more than a decade, continuously improving in order to meet the ever evolving requirements of the LHC experiments and the whole physics user community. In order to satisfy the demands of LHC Run-3, in terms of storage performance and tradeoff between cost and capacity, EOS was enhanced with a set of new functionalities and features that we will...

58. Automated Network Services for Exascale Data Movement

Balcas, Justas (California Institute of Technology)

5/8/23, 2:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The Large Hadron Collider (LHC) experiments distribute data by leveraging a diverse array of National Research and Education Networks (NRENs), where experiment data management systems treat networks as a “blackbox” resource. After the High Luminosity upgrade, the Compact Muon Solenoid (CMS) experiment alone will produce roughly 0.5 exabytes of data per year. NREN Networks are a critical part...

125. A Named Data Networking Based Fast Open Storage System plugin for XRootD

Shannigrahi, Susmit (Tennessee Tech University)

5/8/23, 2:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

We present an NDN-based Open Storage System (OSS) plugin for XRootD instrumented with an accelerated packet forwarder, built for data access in the CMS and other experiments at the LHC, together with its current status, performance as compared to other tools and applications, and plans for ongoing developments.

Named Data Networking (NDN) is a leading Future Internet Architecture where data...

584. ALTO/TCN: Toward an Architecture of Efficient and Flexible Data Transport Control for Data-Intensive Sciences using Deep Infrastructure Visibility

Yang, Ryan (Choate Rosemary Hall)

5/8/23, 2:30 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

There is increasing demand for the efficiency and flexibility of data transport systems supporting data-intensive sciences. With growing data volume, it is essential that the transport system of a data-intensive science project fully utilize all available transport resources (e.g., network bandwidth); to achieve statistical multiplexing gain, there is an increasing trend that multiple projects...

285. A study case of Content Delivery Network solutions for the CMS experiment

Dengra, Carlos (CIEMAT/PIC)

5/8/23, 2:45 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

In 2029 the LHC will start the High-Luminosity LHC (HL-LHC) program, with a boost in the integrated luminosity resulting in an unprecedented amount of experimental and simulated data samples to be transferred, processed and stored in disk and tape systems across the Worldwide LHC Computing Grid (WLCG). Content delivery network (CDN) solutions are being explored with the purposes of improving...

318. Identifying and Understanding Scientific Network Flows

McKee, Shawn (University of Michigan Physics)

5/8/23, 3:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The High-Energy Physics (HEP) and Worldwide LHC Computing Grid (WLCG) communities have faced significant challenges in understanding their global network flows across the world’s research and education (R&E) networks. When critical links, such as transatlantic or transpacific connections, experience high traffic or saturation, it is very challenging to clearly identify which collaborations...

373. Managing multi-instrument data streams in secure environments

Dr Sauti, Godfrey (NASA)

5/8/23, 3:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The capture and curation of all primary instrument data is a potentially valuable source of added insight into experiments or diagnostics in laboratory experiments. The data can, when properly curated, enable analysis beyond the current practice that uses just a subset of the as-measured data. Complete curated data can also be input for machine learning and other data exploration tools....

181. Xrootd S3 Gateway for WLCG Storage

Yang, Wei (SLAC National Accelerator Laboratory)

5/9/23, 11:00 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The Xrootd S3 Gateway is a universal high performance proxy service that can be used to access S3 portals using existing HEP credentials (e.g. JSON Web Tokens and x509). This eliminates one of the biggest roadblocks to using public cloud storage resources. This paper describes how the S3 Gateway leverages existing HEP software (e.g. Davix and XRootD) to provide a familiar scalable service that...

32. Predicting Resource Usage Trends with Southern California Petabyte Scale Cache

Sim, Alex (Lawrence Berkeley National Laboratory)

5/9/23, 11:15 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

There has been a significant increase in data volume from various large scientific projects, including the Large Hadron Collider (LHC) experiment. The High Energy Physics (HEP) community requires increased data volume on the network, as the community expects to produce almost thirty times annual data volume between 2018 and 2028 [1]. To mitigate the repetitive data access issue and network...

436. Storing LHC Data in Amazon S3 and Intel DAOS through RNTuple

Ms Lazzari Miotto, Giovanna (UFRGS (BR))

5/9/23, 11:30 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

Current and future distributed HENP data analysis infrastructures rely increasingly on object stores in addition to regular remote file systems. Such file-less storage systems are popular as a means to escape the inherent scalability limits of the POSIX file system API. Cloud storage is already dominated by S3-like object stores, and HPC sites are starting to take advantage of object stores...

50. Understanding Data Access Patterns for dCache System

Wu, Kesheng (Lawrence Berkeley National Laboratory)

5/9/23, 11:45 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

At Brookhaven National Lab, the dCache storage management system is used as a disk cache for large high-energy physics (HEP) datasets primarily from the ATLAS experiment[1]. Storage space on dCache is considerably smaller than the full ATLAS data collection. Therefore, a policy is needed to determine what data files to keep in the cache and what files to evict. A good policy is to keep...

611. A Ceph S3 Object Data Store for HEP

Smith, Nick (Fermilab)

5/9/23, 12:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

In this talk, we present a novel data format design that obviates the need for data tiers by storing individual event data products in column objects. The objects are stored and retrieved through Ceph S3 technology, and a companion metadata system handles tracking of the object lifecycle. Performance benchmarks of data storage and retrieval will be presented, along with scaling tests of the...

192. Extending Rucio with modern cloud storage support: Experiences from ATLAS, SKA and ESCAPE

Lassnig, Mario (CERN)

5/9/23, 12:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

Rucio is a software framework that provides scientific collaborations with the ability to organise, manage and access large volumes of data using customisable policies. The data can be spread across globally distributed locations and across heterogeneous data centres, uniting different storage and network technologies as a single federated entity. Rucio offers advanced features such as...

201. An HTTP REST API for Tape-backed Storage

Davis, Michael (CERN), Mr Afonso, Joao (CERN)

5/9/23, 2:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The goal of the “HTTP REST API for Tape” project is to provide a simple, minimalistic and uniform interface to manage data transfers between Storage Endpoints (SEs) where the source file is on tape. The project is a collaboration between the developers of WLCG storage systems (EOS+CTA, dCache, StoRM) and data transfer clients (gfal2, FTS). For some years, HTTP has been growing in popularity as...

448. Challenging the economy of tape storage with the disk-based one

Ahn, Sang Un (KISTI)

5/9/23, 2:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

CDS (Custodial Disk Storage), a disk-based custodial storage powered by CERN EOS storage system, has been operating for the ALICE experiment at the KISTI Tier-1 Centre since November 2021. The CDS replaced existing tape storage operated for almost a decade, after its stable demonstration in the WLCG Tape Challenges in October 2021. We tried to challenge the economy of tape storage in the...

178. Updates to the ATLAS Data Carousel Project

Zhao, Xin (Brookhaven National Laboratory (US))

5/9/23, 2:30 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The High Luminosity upgrade to the LHC (HL-LHC) is expected to deliver scientific data at the multi-exabyte scale. In order to address this unprecedented data storage challenge, the ATLAS experiment launched the Data Carousel project in 2018. Data Carousel is a tape-driven workflow whereby bulk production campaigns with input data resident on tape are executed by staging and promptly...

203. Evolution of the CERN Backup system based on RESTIC and the CERN Tape Archive (CTA)

Dr Rademakers, Fons (CERN)

5/9/23, 2:45 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The CERN IT Department is responsible for ensuring the integrity and security of data stored in the IT Storage Services. General storage backends such as EOSHOME/PROJECT/MEDIA and CEPHFS are used to store data for a wide range of use cases for all stakeholders at CERN, including experiment project spaces and user home directories.

In recent years a backup system, CBACK, was developed based...

199. The CERN Tape Archive Beyond CERN — an Open Source Data Archival System for HEP

Davis, Michael (CERN)

5/9/23, 3:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The CERN Tape Archive (CTA) was conceived as the successor to CASTOR and as the tape back-end to EOS, designed for the archival storage of data from LHC Run-3 and other experimental programmes at CERN. In the wider WLCG, the tape software landscape is quite heterogenous, but we are now entering a period of consolidation. This has led to a number of sites in WLCG (and beyond) reevaluating their...

377. Transferable Improved Analyses Turnaround through Intelligent Caching and Optimized Resource Allocation

Fischer, Benjamin (RWTH Aachen University (DE))

5/9/23, 4:30 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The development of an LHC physics analysis involves numerous investigations that require the repeated processing of terabytes of measured and simulated data. Thus, a rapid processing turnaround is beneficial to the scientific process. We identified two bottlenecks in analysis independent algorithms and developed the following solutions.
First, inputs are now cached on individual SSD caches of...

241. Evaluation of Rucio as a Metadata Service for the Belle II experiment

Panta, Anil (University of Mississippi)

5/9/23, 4:45 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

Rucio is a Data Management software that has become a de-facto standard in the HEP community and beyond. It allows the management of large volumes of data over their full lifecycle. The Belle II experiment located at KEK (Japan) recently moved to Rucio to manage its data over the coming decade (O(10) PB/year). In addition to its Data Management functionalities, Rucio also provides support for...

185. Conditions Evolution in ATLAS

Costanzo, Davide (University of Sheffield (UK))

5/9/23, 5:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The ATLAS experiment is preparing a major change in the conditions data infrastructure in view of Run4 In this presentation we will expose the main motivations for the new design (called CREST for Conditions-REST), the ongoing changes in the DB architecture and present the developments for the deployment of the new system. The main goal is to setup a parallel infrastructure for full scale...

27. Calibration and Conditions Database of the ALICE experiment in Run 3

Grigoras, Costin (CERN)

5/9/23, 5:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The ALICE experiment at CERN has undergone a substantial detector, readout and software upgrade for the LHC Run3. A signature part of the upgrade is the triggerless detector readout, which necessitates a real time lossy data compression from 1.1TB/s to 100GB/s performed on a GPU/CPU cluster of 250 nodes. To perform this compression, a significant part of the software, which traditionally is...

526. The HSF Conditions Database reference implementation

Gerlach, Lino (Brookhaven National Laboratory)

5/9/23, 5:30 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The HSF Conditions Databases activity is a forum for cross-experiment discussions hoping for as broad a participation as possible. It grew out of the HSF Community White Paper work to study conditions data access, where experts from ATLAS, Belle II, and CMS converged on a common language and proposed a schema that represents best practice. The focus of the HSF work is the most difficult use...

261. Testing framework and monitoring system for the ATLAS EventIndex

Dr Gallas, Elizabeth (University of Oxford)

5/9/23, 5:45 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The ATLAS EventIndex is a global catalogue of the events collected, processed or generated by the ATLAS experiment. The system was upgraded in advance of LHC Run 3, with a migration of the Run 1 and Run 2 data from HDFS MapFiles to HBase tables with a Phoenix interface. The frameworks for testing functionality and performance of the new system have been developed. There are two types of tests...

322. Nordic Data Lake Success Story

Wadenstein, Mattias (NeIC)

5/11/23, 11:15 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The Data Lake concept has promised increased value to science and more efficient operations for storage compared to the traditional isolated storage deployments. Building on the established distributed dCache serving as the Nordic Tier-1 storage for LHC data, we have also integrated tier-2 pledged storage in Slovenia, Sweden, and Switzerland, resulting in a coherent storage space well above...

151. The implementation of Data Management and Data Service for HEPS

Sun, Haokai (Institute of High Energy Physics, CAS)

5/11/23, 11:30 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

China’s High Energy Photon Source (HEPS), the first national high-energy synchrotron radiation light source and soon one of the world’s brightest fourth-generation synchrotron radiation facilities, is being under intense construction in Beijing’s Huairou District, and will be completed in 2025.

To make sure that the huge amount of data collected at HEPS is accurate, available and...

397. Overview of the distributed image processing infrastructure to produce the Legacy Survey of Space and Time (LSST)

Hernandez, Fabio (IN2P3 / CNRS computing center)

5/11/23, 11:45 AM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The [Vera C. Rubin observatory][1] is preparing for execution of the most ambitious astronomical survey ever attempted, the Legacy Survey of Space and Time (LSST). Currently in its final phase of construction in the Andes mountains in Chile and due to start operations late 2024 for 10 years, its 8.4-meter telescope will nightly scan the southern sky and collect images of the entire visible sky...

205. EPN2EOS Data Transfer System

Ms Suiu, Alice Florenta

5/11/23, 12:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

ALICE is one of the four large experiments at the CERN LHC designed to study the structure and origins of matter in collisions of heavy ions (and protons) at ultra-relativistic energies. The experiment measures the particles produced as a result of collisions in its center so that it can reconstruct and study the evolution of the system produced during these collisions. To perform these...

216. FTS Service Evolution and LHC Run-3 Operations

Murray, Steven (CERN)

5/11/23, 12:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The File Transfer System (FTS) is a software system responsible for queuing, scheduling, dispatching and retrying file transfer requests, it is used by three of the LHC experiments, namely ATLAS, CMS and LHCb, as well as non LHC experiments including AMS, Dune and NA62. FTS is critical to the success of many experiments and the service must remain available and performant during the entire...

34. Integrating FTS in the Fenix HPC infrastructure

Ms Long, Shiting (Forschungszentrum Jülich)

5/11/23, 12:30 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

HPC systems are increasingly often used for addressing various challenges in high-energy physics. But often the data infrastructures used in the latter area are not well integrated with infrastructures that include HPC resources. Here we will focus on a specific infrastructure, namely Fenix, which is based on a consortium of 6 leading European supercomputing centres. The Fenix sites are...

191. Exploring Future Storage Options for ATLAS at the BNL/SDCC facility

Huang, Qiulan (Brookhaven National Laboratory)

5/11/23, 2:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The rapid growth of scientific data and the computational needs of BNL-supported science programs will bring the Scientific Data and Computing Center (SDCC) to the Exabyte scale in the next few years. The SDCC Storage team is responsible for the symbiotic development and operations of storage services for all BNL experiment data, in particular for the data generated by the ATLAS experiment...

Chuchuk, Olga (CERN, University of Cote d'Azur)

5/11/23, 2:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

In the HEP community, the prediction of Data Popularity is a topic that has been approached for many years. Nonetheless, while facing increasing data storage challenges, especially in the HL-LHC era, we are still in need for better predictive models to answer the questions of whether particular data should be kept, replicated, or deleted.

The usage of caches proved to be a convenient...

179. New XrootD monitoring implementation

Mr Garrido Bear, Borja (CERN)

5/11/23, 2:30 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

Complete and reliable monitoring of the WLCG data transfers is an important condition for effective computing operations of the LHC experiments. WLCG data challenges organised in 2021 and 2022 highlighted the need for improvements in the monitoring of data traffic on the WLCG infrastructure. In particular, it concerns the implementation of the monitoring of the remote data access via the...

614. 400Gbps benchmark of XRootD HTTP-TPC

Arora, Aashay (UCSD)

5/11/23, 2:45 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

Due to the increased demand of network traffic expected during the HL-LHC era, the T2 sites in the USA will be required to have 400Gbps of available bandwidth to their storage solution.
With the above in mind we are pursuing a scale test of XRootD software when used to perform Third Party Copy transfers using the HTTP protocol. Our main objective is to understand the possible limitations in...

168. Scale tests of the new DUNE data pipeline

Timm, Steven (Fermi National Accelerator Laboratory)

5/11/23, 3:00 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

In preparation for the second runs of the ProtoDUNE detectors at CERN (NP02 and NP04), DUNE has established a new data pipeline for bringing the data from the EHN-1 experimental hall at CERN to primary tape storage at Fermilab and CERN, and then spreading it out to a distributed disk data store at many locations around the world. This system includes a new Ingest Daemon and a new Declaration...

627. ICARUS signal processing with HEPnOS

Syed, S. (FNAL)

5/11/23, 3:15 PM

Track 1 - Data and Metadata Organization, Management and Access

Oral

The LArSoft/art framework is used at Fermilab’s liquid argon time projection chamber experiments such as ICARUS to run traditional production workflows in a grid environment. It has become increasingly important to utilize HPC facilities for experimental data processing tasks. As part of the SciDAC-4 HEP Data Analytics on HPC and HEP Event Reconstruction with Cutting Edge Computing...

Building timetable...

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Session

Track 1 - Data and Metadata Organization, Management and Access

Norfolk Ballroom III-V

Norfolk Waterside Marriott

Conveners

Track 1 - Data and Metadata Organization, Management and Access: Storage

Track 1 - Data and Metadata Organization, Management and Access: Networks

Track 1 - Data and Metadata Organization, Management and Access: Clouds & Caches

Track 1 - Data and Metadata Organization, Management and Access: Tapes

Track 1 - Data and Metadata Organization, Management and Access: Databases & Metadata

Track 1 - Data and Metadata Organization, Management and Access: Data Management

Track 1 - Data and Metadata Organization, Management and Access: Analytics & Benchmarks

Presentation materials

Choose timezone

26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY & NUCLEAR PHYSICS (CHEP2023)

Conference Secretariat

Conveners

Track 1 - Data and Metadata Organization, Management and Access: Storage

Track 1 - Data and Metadata Organization, Management and Access: Networks

Track 1 - Data and Metadata Organization, Management and Access: Clouds & Caches

Track 1 - Data and Metadata Organization, Management and Access: Tapes

Track 1 - Data and Metadata Organization, Management and Access: Databases & Metadata

Track 1 - Data and Metadata Organization, Management and Access: Data Management

Track 1 - Data and Metadata Organization, Management and Access: Analytics & Benchmarks

Presentation materials