May 8 – 12, 2023
Norfolk Waterside Marriott
Laurelin: A ROOT I/O implementation for Apache Spark

May 9, 2023, 3:15 PM
Hampton Roads Ballroom VIII (Norfolk Waterside Marriott)

235 East Main Street Norfolk, VA 23510
Oral Track 6 - Physics Analysis Tools Track 6 - Physics Analysis Tools


Melo, Andrew (Vanderbilt University)


Apache Spark is a distributed computing framework which can process very large datasets using large clusters of servers. Laurelin is a Java-based implementation of ROOT I/O which allows Spark to read and write ROOT files from common HEP storage systems without a dependency on the C++ implementation of ROOT. We discuss improvements due to the migration to an Arrow-based in-memory representation as well as detail the performance difference for analyses over data stored in either ROOT or the Parquet format.

Primary author

Melo, Andrew (Vanderbilt University)

