The recent release of AwkwardArray 2.0 significantly changes the way that lazy evaluation and task-graph building are handled in columnar analysis. The Dask parallel processing library is now used for these pieces of functionality with AwkwardArray, and this change affords new ways of optimizing columnar analysis and distributing it on clusters. In particular this allows optimization of a task graph all the way to the user code, possibly obviating the “processor” pattern Coffea has relied upon up to now. Utilizing this functionality completely required a major retooling of Coffea for this new infrastructure, which has resulted in a more extensible and easily maintainable codebase depending on the dask-awkward, and dask-histogram packages. We will demonstrate comparative performance benchmarks between Awkward-array 1.0 and Awkward-array 2.0 based releases of Coffea, as well as between processor-based and fully-dask-optimized compute graphs in AwkwardArray 2.0.
|Consider for long presentation||No|