Current and future distributed HENP data analysis infrastructures rely increasingly on object stores in addition to regular remote file systems. Such file-less storage systems are popular as a means to escape the inherent scalability limits of the POSIX file system API. Cloud storage is already dominated by S3-like object stores, and HPC sites are starting to take advantage of object stores for the next generation of supercomputers. In light of this, ROOT's new I/O subsystem RNTuple has been engineered to support object stores alongside (distributed) file systems as first class citizens, while also addressing performance bottlenecks and interface shortcomings of its predecessor, TTree I/O.
In this contribution, we describe the improvements around RNTuple’s support for object stores, expounding on the challenges and insights toward efficient storage and high-throughput data transfers. Specifically, we introduce RNTuple’s native backend for the Amazon S3 cloud storage and present the latest developments in our Intel DAOS backend, demonstrating RNTuple’s integration with next-generation HPC sites.
Through experimental evaluations, we compare the two backends in single node and distributed end-to-end analyses using ROOT’s RDataFrame, proving Amazon S3 and Intel DAOS as viable HENP storage providers.
|Consider for long presentation||No|