Chapter 2
Metal3 Architecture and Internal Workings
Behind every successful bare metal Kubernetes deployment lies a sophisticated choreography of controllers, APIs, and declarative abstractions. This chapter unpacks the inner workings of Metal3, peeling back the layers to expose how its modular components transform hardware into programmable infrastructure. By mapping data flows and exploring integration points, you'll see how Metal3 operationalizes complexity and creates new possibilities for scalable, secure, and automated cluster management.
2.1 Overview of Metal3 Components
Metal3 is architected around three core components: Bare Metal Operator (BMO), OpenStack Ironic, and Cluster API Provider Metal3 (CAPM3). Each serves distinct roles within the infrastructure lifecycle while collectively enabling the full automation of bare metal provisioning in Kubernetes environments. The modular design of Metal3 supports flexibility, extensibility, and clear separation of concerns, aligning infrastructure management closely with declarative Kubernetes paradigms.
At the foundation of the Metal3 stack lies OpenStack Ironic, a mature bare metal provisioning service that manages the lifecycle of physical machines as a service. Ironic abstracts hardware-level operations such as power management, hardware inspection, BIOS configuration, and deployment of operating systems through network booting. Crucially, Ironic utilizes drivers tailored to diverse hardware vendors, enabling uniform control even in heterogeneous data center environments. Its API-driven model provides the essential low-level functionalities required to enroll, manage, and provision bare metal hosts, handling boot workflows and state transitions from provisioning to decommissioning.
Sitting directly above Ironic, the Bare Metal Operator (BMO) integrates these capabilities into the Kubernetes ecosystem by exposing bare metal hosts as native Kubernetes Custom Resources (CRs). BMO acts as a bridge between Kubernetes controllers and Ironic's API, continuously reconciling the desired state of bare metal resources with actual hardware states. Responsibilities of BMO include managing hardware inventory, orchestrating power and boot state changes, handling hardware inspection requests, and updating status conditions in the Kubernetes API. By embedding hardware lifecycle operations into Kubernetes' control loop, BMO transforms physical machines into first-class, declaratively managed resources akin to Pods or Persistent Volumes. This approach enhances observability and control and permits razor-sharp synchronization of infrastructure changes through Kubernetes tooling.
Completing the triad, the Cluster API Provider Metal3 (CAPM3) extends the Kubernetes Cluster API framework to enable declarative lifecycle management of bare metal clusters. Cluster API (CAPI) provides a vendor-agnostic method to manage Kubernetes clusters themselves as Kubernetes resources, fostering automation of cluster creation, scaling, and upgrades. CAPM3 delivers the bare metal-specific implementation of CAPI's provider interfaces by coordinating BMO-managed bare metal hosts within cluster provisioning workflows. CAPM3 consumes Kubernetes Cluster and Metal3 Machine CRs, translating desired cluster and node states into sequences that BMO and Ironic execute, thus soldering together cluster lifecycle operations with physical hardware management. In effect, CAPM3 orchestrates node bootstrap, matchmaking of hosts to cluster blueprint, and cluster state reconciliation over a physical infrastructure substrate.
This three-tier decomposition into Ironic, BMO, and CAPM3 realizes several design rationales:
- Clear boundary and responsibility separation: Ironic manipulates hardware-level actions, encapsulated as an independent provisioning service, while BMO encapsulates the imperative-to-declarative transition and Kubernetes-native abstraction of hardware. CAPM3 leverages higher-level cluster lifecycle concepts without delving into hardware specifics.
- Reusability and ecosystem synergy: By adopting OpenStack Ironic, Metal3 harnesses a robust industry standard for bare metal provisioning, benefiting from its extensive driver ecosystem, operational maturity, and hardware support. This prevents duplication of foundational capabilities within Kubernetes.
- Modularity and extensibility: The three components interface through well-defined APIs and resources, enabling substitution or enhancements without wholesale redesign. For example, replacement of Ironic with alternative provisioning backends is theoretically feasible without impacting core Kubernetes abstractions.
- Alignment with Kubernetes declarative model: BMO and CAPM3 embody the Kubernetes GitOps philosophy-desired infrastructure state is declared as Kubernetes CRs, with controllers continuously reconciling actual states. This fosters automation, idempotency, and integration with emerging Kubernetes ecosystem tools.
- Resilience and scalability: Each component operates autonomously, communicating via APIs and CRs, which aids in isolating faults, scaling control plane components independently, and facilitating distributed operational models.
The typical workflow orchestrated by these components is as follows: an administrator or automation pipeline defines a cluster specification via Cluster API CRs, specifying the desired cluster topology. CAPM3 translates this declaration into a set of Metal3 Machine objects and orchestrates their provisioning by allocating compatible bare metal hosts through BMO objects. BMO consults Ironic to perform node introspection, hardware configuration, power state transitions, and ultimately trigger operating system deployments via Ironic's provisioning interfaces. Throughout this lifecycle, status and health information propagate upward, enabling continuous feedback and automated remediation.
Inter-component communication relies on secure REST APIs and Kubernetes resource watches. BMO's controller watches Machine CRs created by CAPM3 and manages the lifecycle of BareMetalHost CRs that represent individual machines. Meanwhile, BMO interacts synchronously or asynchronously with Ironic's REST API to translate high-level resource states into hardware-specific instructions. CAPM3 remains cluster-aware, observing cluster-wide resource status to adapt machine provisioning accordingly.
Metal3's architectural partition into Ironic, BMO, and CAPM3 encapsulates a layered control framework spanning hardware to cluster management. This division enforces strong modular boundaries while maintaining cohesive end-to-end workflows, enabling robust, scalable, and Kubernetes-native bare metal automation. Understanding the distinct purpose and interaction modalities of each component is essential when designing, deploying, or customizing bare metal Kubernetes clusters utilizing Metal3 technology.
2.2 Cluster API and Declarative Provisioning
Kubernetes Cluster API (CAPI) introduces a paradigm shift in managing clusters and their constituent nodes by employing a fully declarative approach that integrates seamlessly with Kubernetes' native resource model. Rather than relying on imperative commands or bespoke provisioning scripts, CAPI utilizes custom resource definitions (CRDs) to represent the lifecycle of clusters and nodes as first-class Kubernetes objects. This abstraction allows operators and developers to define desired states for infrastructure components within standard Kubernetes manifests, enabling automated reconciliation loops to maintain consistency and observe changes.
At the core of the Cluster API model are several key custom resources: Cluster, Machine, MachineDeployment, and MachineSet. The Cluster resource acts as an overarching abstraction encapsulating control plane and infrastructure-specific details about the cluster environment. Machine resources correspond to individual nodes, whether control plane or worker nodes, and include configuration about their machine image, bootstrap data, and hardware profile. MachineDeployment and MachineSet resources provide declarative APIs for managing groups of machines and enable rolling update strategies analogous to Kubernetes Deployment resources, thus facilitating seamless scaling and upgrades.
Metal3, as a Kubernetes-native bare-metal provisioning solution, leverages this declarative framework by implementing provider-specific controllers that reconcile these Cluster API resources with physical infrastructure states. The Metal3 Provider implements the translation between the abstracted machine and cluster lifecycle as expressed in Kubernetes manifests, and the low-level provisioning and management of physical hosts using technologies such as the Intelligent Platform Management Interface (IPMI), Redfish, and iPXE booting. This approach exploits the power of infrastructure-as-code to automate workflows including bare-metal node registration, provisioning, deprovisioning, networking configurations, and firmware updates.
The reconciliation process...