Chapter 2
Keptn Architecture and Core Concepts
Keptn redefines how cloud-native automation is architected, blending event-driven principles with a modular, extensible control plane designed for reliability at scale. In this chapter, we penetrate the core internals and abstractions of Keptn, uncovering how its architectural blueprint empowers platform teams to orchestrate delivery, operations, and remediation with unprecedented autonomy and visibility.
2.1 Keptn's Event-Driven Control Plane
Keptn's architecture centers around an event-driven control plane that orchestrates complex automation workflows within cloud-native environments. This control plane is a modular and loosely coupled ecosystem designed to manage the lifecycle of delivery, operations, and remediation tasks with high scalability and resilience. Understanding Keptn's control plane requires a detailed examination of its event flows, message broker interactions, and the delineation between core services and extensions.
At the heart of the control plane lies an event-driven messaging system that facilitates communication between distributed components. Keptn components exchange messages primarily through a message broker, commonly implemented with technologies like NATS or Kafka. This foundational layer ensures asynchronous, reliable, and scalable event propagation. Events in Keptn denote discrete state changes or commands within the software delivery lifecycle, encapsulating contextual information essential to the recipient services.
The lifecycle of an event in Keptn begins with a trigger, often originating from an external system such as a Continuous Integration (CI) pipeline, a user-initiated deployment request, or an automated monitoring alert. Upon trigger, a Keptn event is constructed and published to the message broker. Each event adheres to a standardized CloudEvents specification, ensuring interoperability and extensibility across heterogeneous components.
Once an event is published, it is consumed by one or more Keptn services, each tasked with processing, augmenting, or forwarding the event according to the workflow stage. Keptn's control plane distinguishes these services into two primary categories: core services and extension services. Core services orchestrate essential lifecycle operations, such as event routing, state management, and sequence execution. Extension services implement domain-specific logic, including quality gate evaluation, automated tests, or deployment strategies.
The core services maintain event sequencing and state cohesion through a dedicated component often referred to as the Shipyard Controller. This controller manages the execution of predefined workflows described in declarative shipyard.yaml files. Upon receiving a triggering event, the Shipyard Controller interprets the desired release sequence and emits subsequent events to initiate tasks like deployment, testing, and evaluation. It also aggregates the results of these tasks before progressing to the next stage, thereby enforcing workflow integrity.
Event interactions in Keptn follow a chained pattern where each service acts upon the event data and emits follow-up events signaling success, failure, or intermediate states. This chain is inherently asynchronous, improving fault tolerance by decoupling the progression of workflows from any single component's availability or performance. If one service is temporarily offline or slow, the message broker buffers events, allowing automatic recovery and eventual consistency.
A critical aspect of Keptn's control plane is the separation of concerns between core and extension services. Core services abstract the orchestration and infrastructure concerns, enabling uniform handling of events irrespective of business-specific logic. Conversely, extension services encapsulate the customizable parts of the delivery pipeline, such as integration with testing tools, incident detection systems, or custom deployment mechanisms. This architecture fosters pluggability; new capabilities can be introduced without modifying core components, simply by deploying additional extension services that subscribe to relevant events.
The event payloads carry contextual metadata, including project identifiers, stage information, service definitions, and deployment configurations. This metadata enables extension services to execute their logic accurately within the relevant scope. Moreover, events consistently report their status, timestamps, and traces, facilitating observability and troubleshooting within distributed workflows.
Consider the example of a deployment pipeline triggered by a deployment.triggered event. The control plane routes this event to deployment extension services, which initiate the deployment on the target environment. Upon successful deployment, the extension emits a deployment.finished event. The Shipyard Controller, upon receiving this, triggers the next phase, often a quality gate evaluation, by emitting a tests.started event. Quality gate services analyze test results and signal their completion with tests.finished events. If any failures occur, remediation extension services may be automatically engaged via corresponding events.
This event-driven choreography is inherently extensible. New event types can be introduced along with corresponding extensions to enrich the automation capabilities. The communication via a message broker abstracts away service locations, supporting distributed deployments across multiple clusters or even cloud providers.
In summary, Keptn's event-driven control plane achieves loosely coupled automation through a well-defined event lifecycle, a robust message broker backbone, and a strict separation between core orchestration services and domain-specific extensions. This architecture enables scalable, resilient delivery and operational workflows that can be dynamically adapted to evolving requirements and infrastructure landscapes.
2.2 Project, Stage, and Service Model
Keptn's domain model is structured around three primary abstractions: projects, stages, and services. These abstractions establish a clear organizational framework that aligns closely with modern enterprise delivery pipelines, enabling environment isolation, progressive delivery strategies, and modular workflow orchestration. Understanding this layered model is essential to leverage Keptn's full capabilities for continuous delivery and automated operations.
Project Abstraction
A project in Keptn represents a logical boundary encapsulating all resources, configurations, and processes associated with a particular product or application lifecycle. This level aggregates multiple stages and services, providing a holistic view of delivery activities and governance policies for that product. Projects serve as natural organizational units for teams, allowing separation of concerns and independent management of delivery pipelines.
Each project aggregates metrics, events, and traces generated across its constituent services and stages, facilitating centralized monitoring and reporting. From a governance perspective, projects enable role-based access control (RBAC) to associate different user permissions and policies at a contextually meaningful level, ensuring that stakeholders can operate within clearly defined boundaries.
Stage Abstraction
Stages model the progression of software through discrete environments such as dev, staging, and production. These provide environment isolation that is critical to implementing progressive delivery patterns, such as canary releases or blue-green deployments. By defining stages explicitly, Keptn permits pipeline workflows to propagate artifacts and configurations through a controlled sequence of validation and deployment steps.
This stage-based segmentation accommodates environment-specific configurations and quality gates, enabling distinct observability and remediation policies for each environment. For example, failure thresholds or test suites applied in a staging stage may differ significantly from those in production, allowing delivery workflows to be finely tuned per environment.
Stages not only demarcate environments physically or virtually but also serve as holders for stage-specific integrations and webhook triggers. This isolation ensures workflows act deterministically within each environment context, avoiding the risk of cross-environment interference.
Service Abstraction
At the finest granularity, services represent deployable units-microservices, functions, or monolithic application components-that comprise the overall system. Services are essential in capturing the modular nature of contemporary software architectures. Keptn treats each service as an independent entity within a project and stage, enabling granular control over its delivery lifecycle, quality gates, and remediation actions.
This abstraction aligns with microservice principles, where individual services can follow independent release cadences and workflows without impacting unrelated parts of...