CERN IT has consolidated all life-cycle management of its physical server fleet on the Ironic bare-metal API. From the initial registration upon the first boot, over the inventory checking, the burn-in and the benchmarking for acceptance, the provisioning to the end users and the repairs during its service, up to the retirement at the end of the servers’ life, all stages can be managed within this framework. In this presentation we will follow a server throughout its life in the CERN data center, and explain how this enables us to handle a fleet of 10’000 nodes in an automated and efficient way and to prepare for the new data centre which is currently being built. We will add the top challenges we faced when moving to this system, like the transparent adoption of already in-production nodes or after-the-fact inventory updates, and eventually round things up with our “GRUBsetta stone”, a collection of boot errors and what they really mean.
|Consider for long presentation||Yes|