Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Laurelin: A ROOT I/O implementation for Apache Spark

May 9, 2023, 3:15 PM
15m
Hampton Roads Ballroom VIII (Norfolk Waterside Marriott)

Hampton Roads Ballroom VIII

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 6 - Physics Analysis Tools Track 6 - Physics Analysis Tools

Speaker

Melo, Andrew (Vanderbilt University)

Description

Apache Spark is a distributed computing framework which can process very large datasets using large clusters of servers. Laurelin is a Java-based implementation of ROOT I/O which allows Spark to read and write ROOT files from common HEP storage systems without a dependency on the C++ implementation of ROOT. We discuss improvements due to the migration to an Arrow-based in-memory representation as well as detail the performance difference for analyses over data stored in either ROOT or the Parquet format.

Consider for long presentation No

Primary author

Melo, Andrew (Vanderbilt University)

Presentation materials