The rapid growth of scientific data and the computational needs of BNL-supported science programs will bring the Scientific Data and Computing Center (SDCC) to the Exabyte scale in the next few years. The SDCC Storage team is responsible for the symbiotic development and operations of storage services for all BNL experiment data, in particular for the data generated by the ATLAS experiment with the largest amount of data. While the steady increase in ATLAS needs for DISK storage capacity, the cost issue to continue with more than one DISK copies and the updated ATLAS storage environment have brought new challenges to SDCC. In order to overcome the challenges arising from the vast amount of data while enabling efficient and cost-effective data analysis in a large-scale, multi-tiered storage architecture, the Storage team has undertaken a thorough analysis of the ATLAS experiment’s requirements, matching them to the appropriate storage options and strategy, and has explored alternatives to complement/replace our current storage solution. In this paper, we present the main challenges presented by supporting several big data experiments like ATLAS. We describe its requirements and priorities, in particular, what critical storage system characteristics are needed for the high-luminosity run and how the key storage components provided by the Storage team work together: the dCache disk storage system; its archival back-end, HPSS, and its OS-level backend Storage. In particular, we investigate a new solution to integrate Lustre and XRootd. Lustre serves as backend storage and XRootd acts as an access layer frontend to support different grid access protocols. We also describe the validation, commissioning tests, and a comparison between dCache and XRootd in performance. In addition, the performance and cost comparison of OpenZFS and LINUX MDRAID, the evaluation of storage software stacks, and the stress tests to validate Third Party Copy(TPC) will be illustrated.
|Consider for long presentation||No|