Speaker
Melo, Andrew
(Vanderbilt University)
Description
Apache Spark is a distributed computing framework which can process very large datasets using large clusters of servers. Laurelin is a Java-based implementation of ROOT I/O which allows Spark to read and write ROOT files from common HEP storage systems without a dependency on the C++ implementation of ROOT. We discuss improvements due to the migration to an Arrow-based in-memory representation as well as detail the performance difference for analyses over data stored in either ROOT or the Parquet format.
Consider for long presentation | No |
---|
Primary author
Melo, Andrew
(Vanderbilt University)