The Large High Altitude Air Shower Observatory (LHAASO) is a large-scale astrophysics experiment led by China. The offline data processing was highly dependent on the Institute of High Energy Physics(IHEP) local cluster and the local file system.
As the LHAASO experimental cooperation groups’ resources are located geographically and most of them have the characteristics of limited scale, low stability, and lack of human support, it is difficult to integrate them via Grid. We designed and developed a lightweight distributed computing system for LHAASO offline data processing. Unlike the grid model, the system keeps the IHEP cluster as the main cluster and extends the cluster to the worker nodes of the remote site. LHAASO jobs are submitted to the IHEP cluster and are dispatched to the remote worker node in the system.
Tokens are the authentication and authorization solution in the whole cluster, LHAASO computing tasks are classified into several types. Each type of job is wrapped by a dedicated script which helps the job have no direct access to the IHEP file system. The system draws on the idea of “startd automatic cluster joining” of GlideinWMS but abandons the grid certificate authentication.
About 125 worker nodes with 4k CPU cores at the remote site have been joined into IHEP LHAASO cluster by the distributed computing system and provided LHAASO job to produce 700TB simulation data in 6 months.
|Consider for long presentation||Yes|