The OSG-operated Open Science Pool is an HTCondor-based virtual cluster that aggregates resources from compute clusters provided by several organizations. A user can submit batch jobs to the OSG-maintained scheduler, and they will eventually run on a combination of supported compute clusters without any further user action. Most of the resources are not owned by, or even dedicated to OSG, so demand-based dynamic provisioning is important for maximizing usage without incurring excessive waste.
OSG has long relied on GlideinWMS for most of its resource provisioning needs, but is limited to resources that provide a Grid-compliant Compute Entrypoint. To work around this limitation, the OSG software team had developed a pilot container that resource providers could use to directly contribute to the OSPool. The problem of that approach is that it is not demand-driven, relegating it to backfill scenarios only.
To address this limitation, a demand-driven direct provisioner of Kubernetes resources has been developed and successfully used on the PRP. The setup still relies on the OSG-maintained backfill container images, it just automates the provisioning matchmaking and successive requests. That provisioner has also been recently extended to support Lancium, a green computing cloud provider with a Kubernetes-like proprietary interface. The provisioner logic had been intentionally kept very simple, making this extension a low cost project.
Both PRP and Lancium resources have been provisioned exclusively using this mechanism for almost a year with great results.
|Consider for long presentation||No|