Speaker
Description
dCache at BNL has been in production for almost two decades. For years, dCache used the default driver included with the dCache software to interface with HPSS tape storage systems. Due to the synchronous nature of this approach and the high resource demands resulting from periodic script invocations, scalability was significantly limited. During the WLCG tape challenges, bottlenecks in dCache staging were identified. There were performance issues on staging servers due to high load and out-of-memory problems, and staging servers became nonfunctional under heavy restore request levels of 120K or more.
As a solution to the performance bottlenecks, BNL adapted, designed, and developed the ENDIT (an efficient dCache Interface to HPSS tape storage), initially developed by the Nordic Data Grid Facility. By eliminating high load issues on dCache staging hosts, and increasing the number of simultaneous staging requests to BNL’s HPSS Batch (ERADAT) system for a more efficient and optimized read from HPSS, the ENDIT system offers a scalable staging solution and significantly improves overall tape staging performance. Due to its asynchronous nature, ENDIT provides a more flexible and controllable solution for flushing requests too. A further benefit of ENDIT is the adjustable control of the simultaneous reads and writes to HPSS, preventing overburdening a pool host and stressing the HPSS gateway. In addition, the changes allow us to add new features such as monitoring and analytics capabilities, which we have already begun doing and will do more in the future: monitoring, metadata support, and smart writing.
The BNL ENDIT Staging component has been running stably for nine months and demonstrated noticeable improvements in performance since being deployed to ATLAS dCache production. It has alleviated heavy loads on the stage hosts, and the whole system handled 140K staging requests without any issues. It reached up to 7 GBytes per second during the 2022 WLCG Tape challenge. In the near future, we intend to pursue more aggressive staging testing through the WLCG Tape challenge, and we will include performance statistics in our paper. The ENDIT Flushing component has also been deployed to ATLAS dCache and is working well.
We plan to extend this development as a standard solution to other experiments like BELLE-II in the future. Additionally, ENDIT will serve as a centerpiece for upcoming changes and improvements for the future tape usage of the experiments, such as smart writing, user-defined metadata propagation to HPSS for different writing/reading strategies, etc.
Consider for long presentation | No |
---|