In particle physics, data analysis frequently needs variable-length, nested data structures such as arbitrary numbers of particles per event and combinatorial operations to search for particle decay. Arrays of these data types are provided by the Awkward Array library.
The previous version of this library was implemented in C++, but this impeded its ability to grow. Thus, driven by this limitation, Awkward Array has been deeply restructured to enable its integration with other libraries while preserving its existing high-level API and C++ performance-critical algorithms. In the latest 2.0 release, 50k LoC of C++ have been converted to 20 kLoC of Python.
In this talk, we present the design and features of Awkward Array 2.0 and showcase the full ecosystem that developed as a result of the library’s restructuring work. First, this endeavour has laid the groundwork for full CUDA integration (Awkward Arrays can be copied to a GPU). Second, conversion facilities are now available between Awkward Arrays and ROOT’s RDataFrame, Arrow and Parquet. Finally, multiple libraries have been integrated:
- Dask : Awkward Array calculations can now be delayed and distributed for parallel processing;
- JAX : enables differentiation of functions involving Awkward Arrays;
- Pandas : DataFrame(s) containing Awkward structures.
Awkward Array 2.0 was released at the end of 2022 and is available for physics research now.
|Consider for long presentation||Yes|