Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Solving the dilemma of big data: live storage constraints versus full scientific reach

Not scheduled
1h
Hampton Roads Ballroom and Foyer Area (Norfolk Waterside Marriott)

Hampton Roads Ballroom and Foyer Area

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Poster Poster Poster Session

Speaker

Van Buren, Gene (Brookhaven National Laboratory)

Description

Cutting edge research has driven scientists in many fields into the world of big data. While data storage technologies continue to evolve, the costs remain high for rapid data access on such scales and are a major factor in planning and operations. As a joint effort spanning experiment scientists, developers for ROOT, and industrial leaders in data compression, we sought to address this concern from multiple angles and have brought a new tool to the nuclear and high energy physics communities and beyond.

The first aspect of this effort is to ensure that high compression factors can be achieved. Ultimately, lossy compression provides the lever arm to reach any factor desirable. Using actual data from the STAR and CMS experiments, we have found that the BLAST compression tool developed and provided by Accelogic LLC can cut storage needs by more than half over standard tools without impact to typical analyses.

However, the same degree of lossy compression may not be acceptable for all analysis scenarios. This compels data managers to store with the least precision losses acceptable for all possible analysis, even if many analyses can afford less precision. We have developed the Precision Cascade mechanism as an answer, allowing tiered storage files that provide for higher precision when demanded, and smaller storage when needed.

Having incorporated these technologies into ROOT, we present here our findings and initial experiences. We will demonstrate both the capability of the tools to reduce the footprint on live storage for studies where full precision is not the limiting factor, and the delivery of lossless data for the science that requires it.

Consider for long presentation No

Primary authors

Dr Cali, IIvan A. (Massachusetts Institute of Technology) Canal, Philippe Dr Gonzalez, Juan (Accelogic LLC) LAURET, Jerome (Brookhaven Science Associates) Dr Nunez, Rafael (Accelogic LLC) Van Buren, Gene (Brookhaven National Laboratory) Ms Ying, Yueyang (Massachusetts Institute of Technology)

Co-author

Prof. Burtscher, Martin (Texas State University)

Presentation materials