Speaker
Description
Modern HEP workflows must manage increasingly large and complex data collections. HPC facilities may be employed to help meet these workflows' growing data processing needs. However, a better understanding of the I/O patterns and underlying bottlenecks of these workflows is necessary to meet the performance expectations of HPC systems.
Darshan is a lightweight I/O characterization tool that captures concise views of HPC application I/O behavior. It intercepts application I/O calls at runtime, records file access statistics for each process, and generates log files detailing application I/O access patterns.
Typical HEP workflows include event generation, detector simulation, event reconstruction, and subsequent analysis stages. A study of the I/O behavior of the ATLAS simulation and DAOD_PHY/DAOD_PHYSLITE production, CMS simulation, and DUNE analysis workload using Darshan are presented. Characterization of the various stages at scale would guide the further tuning of the I/O patterns with real HEP workloads to better inform storage capabilities requirements at facilities, uncover the I/O bottlenecks in current workflows when deployed at scale, and provide recommendations for data format and access patterns for future HEP workloads.
Consider for long presentation | No |
---|