Chapter 2
Seldon Core Architecture and Key Concepts
Unlock the essential mechanisms that empower Seldon Core to deliver production-grade model serving at scale. This chapter demystifies the inner workings of deployment orchestration, model pipelines, and extensibility within Kubernetes, revealing how Seldon Core harmonizes complex machine learning workflows with cloud-native primitives. Prepare to navigate the blueprint of intelligent application delivery and explore the foundational patterns and APIs that make advanced MLOps possible.
2.1 Custom Resource Definitions and the SeldonDeployment CR
Kubernetes Custom Resource Definitions (CRDs) extend the Kubernetes API to support domain-specific abstractions beyond the standard primitive resources. By defining these custom resources, operators enable developers to work with higher-level declarative constructs that encapsulate complex application logic and lifecycle management within native Kubernetes workflows. The declarative nature of CRDs underpins reproducibility and control, ensuring that system state converges towards the user-defined specification, with the Kubernetes control plane and associated operators handling reconciliation automatically.
The SeldonDeployment is a prime example of a CRD tailored for machine learning (ML) model serving within Kubernetes. It defines an abstract, extensible API object that represents an entire ML inference service deployment, including one or more predictive models, routing logic, scaling parameters, and operational metadata. This object serves as a single source of truth for the service definition and acts as a control point for the lifecycle management of ML services.
At the core, the SeldonDeployment CR encapsulates the specification of ML model deployment components:
- Predictive Unit Definitions: Each predictive unit corresponds to a model or transformer container, annotated with implementation and interface details such as model type, protocol (REST or gRPC), resource requests, and readiness probes.
- Graph Topology: The deployment schema specifies predictive units organized as directed acyclic graphs, where nodes represent models or transformers and edges define request or response flow. This allows complex ensembles, feature transformations, and fallback mechanisms.
- Routing and Traffic Management: Configurations such as shadow deployments, canary models, or weighted request routing are embedded within the topology to facilitate controlled rollout strategies and experimentation.
- Resource and Autoscaling Policies: Useful directives for CPU/memory requests, limits, and integration with Kubernetes Horizontal Pod Autoscaler (HPA) or custom metrics enable efficient resource utilization and reliability at scale.
- Monitoring and Explainer Annotations: Integration with metrics exporters, logging, tracing, and model explainability frameworks is declaratively incorporated, simplifying observability.
The SeldonDeployment CR schema follows Kubernetes conventions to maintain API versioning, validation, and backward compatibility. It is expressed in YAML or JSON manifest files, adhering strictly to the declaration-with-reconciliation model integral to Kubernetes. This means that an application developer or an ML engineer defines the desired state in a manifest and submits it to the Kubernetes API server. Controllers (operators) continuously observe the cluster state and automatically create, update, or delete the underlying primitives such as Pods, Services, and ConfigMaps to realize the desired deployment.
Declaring an ML service using SeldonDeployment abstracts away details of low-level Kubernetes resource orchestration and exposes a domain-specific API centered on ML workflows. This approach brings several advantages:
- Reproducibility: The declarative manifest can be version-controlled alongside model artifacts, allowing exact recreation of the ML inference environment in separate clusters or across time.
- Control and Observability: Operators enforce spec compliance and emit detailed events reflecting the reconciliation process, improving debugging and operational visibility.
- Extensibility and Portability: By expressing ML services as CRs, the ecosystem can evolve custom extensions and tooling without modifying core Kubernetes components.
- Integration: The use of standard Kubernetes RBAC, admission controllers, and namespaces ensures security and operational consistency in multi-tenant environments.
An abbreviated example of a SeldonDeployment manifest illustrates the core structure:
apiVersion: machinelearning.seldon.io/v1 kind: SeldonDeployment metadata: name: iris-classifier spec: predictors: - graph: name: iris-model implementation: SKLEARN_SERVER modelUri: gs://model-bucket/irismodel name: predictor-1 replicas: 3 componentSpecs: - spec: containers: - name: classifier ...