Indico is back online after maintenance on Tuesday, April 30, 2024.
Please visit Jefferson Lab Event Policies and Guidance before planning your next event: https://www.jlab.org/conference_planning.

May 8 – 12, 2023
Norfolk Waterside Marriott
US/Eastern timezone

Toward Ten-Minute Turnaround in CMS Data Analysis: The View from Notre Dame

May 8, 2023, 2:45 PM
15m
Marriott Ballroom IV (Norfolk Waterside Marriott)

Marriott Ballroom IV

Norfolk Waterside Marriott

235 East Main Street Norfolk, VA 23510
Oral Track 7 - Facilities and Virtualization Track 7 - Facilities and Virtualization

Speaker

Lawrence, John (University of Notre Dame)

Description

Effective analysis computing requires rapid turnaround times in order to enable frequent iteration, adjustment, and exploration, leading to discovery. An informal goal of reducing 10TB of experimental data in about ten minutes using campus-scale computing infrastructure is an achievable goal, just considering raw hardware capability. However, compared to production computing, which seeks to maximize throughput at a massive scale over the timescale of weeks and months, analysis computing requires different optimizations in terms of startup latency, data locality, scalability limits, and long-tail behavior. At Notre Dame, we have developed substantial experience with running scalable analysis codes on campus infrastructure on a daily basis. Using the TopEFT application, based on the Coffea data analysis framework and the Work Queue distributed executor, we reliably process 2TB of data, 375 CPU-hours analysis codes to completion in about one hour on hundreds of nodes, albeit with a high variability due to competing system loads. The python environment needed in the compute nodes is setup and cached on the fly if needed (300MB as tarball sent to worker nodes, 1GB unpacked). In this talk, we present our analysis of the performance limits of the current system, taking into account software dependencies, data access, result generation, and fault tolerance. We present our plans for attacking the ten minute goal through a combination of hardware evolution, improved storage management, and application scheduling.

Consider for long presentation No

Primary authors

Dr Tovar, Benjamin (University of Notre Dame) Mohrman, Kelci (University of Notre Dame) Prof. Thain, Douglas (University of Notre Dame) Prof. Lannon, Kevin (University of Notre Dame)

Presentation materials