There is no lack of approaches for managing the deployment of distributed services – in the last 15 years of running distributed infrastructure, the OSG Consortium has seen many of them. One persistent problem has been each physical site has its style of configuration management and service operations, leading to a partitioning of the staff knowledge and inflexibility in migrating services between sites.
Recently, the team has been migrating the OSG Fabric of Services to be deployed via Kubernetes which provides a common service orchestration fabric across all sites. However, this leaves open a question - how does the team interact with Kubernetes? To coordinate this new style of deployment among geographically distributed clusters and team members, the team has adopted "GitOps", an operational model that uses Git version control repositories to drive service updates. Git-driven operations provides all the benefits of version control such as recording the who, what, when, and why of any given change. But, more powerfully, automated agents synchronize the current state of the Git repository with the current state of the Kubernetes clusters, streamlining the ability to redeploy services from scratch or transfer services between clusters. In this paper, we will describe the setup that enables GitOps deployments of central OSG services and the lessons learned along the way, including rebuilding a suite of services after a critical failure and our experiences with providing high-availability services across multiple Kubernetes clusters.
|Consider for long presentation||No|