Chapter 2
Declarative Workflow Design and Modeling
Step beyond imperative scripting and discover the elegance of expressing complex computational processes in fully declarative terms. This chapter illuminates how Flyte's robust abstractions, strong typing, and dynamic parameterization empower engineers to model, validate, and evolve highly modular workflows. By weaving best practices with deep architectural insight, it reveals how to architect workflows that are reusable, testable, and ready for production-scale automation.
2.1 Authoring Workflows in Python
FlyteKit provides a powerful framework to define and manage workflows directly within Python, enabling developers to leverage familiar language constructs while benefiting from Flyte's robust orchestration capabilities. At the core of this approach are the @task and @workflow decorators, which transform ordinary Python functions into executable tasks and orchestrations, respectively.
A Flyte task encapsulates a discrete piece of logic, typically performing a single atomic operation such as data transformation or model inference. The @task decorator marks a function as a Flyte task, and, importantly, integrates static typing via Python type annotations to precisely define inputs and outputs. This type information not only provides clarity and self-documentation for developers but also enables Flyte's type-driven engine to validate correctness before runtime, as well as facilitate serialization and artifact tracking.
Example Task Definition:
from flytekit import task @task def preprocess_data(input_path: str) -> str: # Load raw data, clean it, and save to a new location cleaned_path = clean_raw_data(input_path) return cleaned_path Within this construct, Flyte restricts task function bodies to idiomatic, deterministic Python code devoid of global side effects, ensuring tasks remain reproducible and easily testable. Tasks can accept primitive types, Flyte-supported complex types (e.g., List, Dict), and user-defined protocols where needed, enlarging the space of computational paradigms supported natively.
In contrast, a Flyte workflow, annotated by the @workflow decorator, orchestrates multiple tasks by defining how individual task outputs feed as inputs to subsequent tasks. Workflows leverage compositionality, empowering the developer to construct complex pipelines by connecting simpler tasks, naturally expressed as Python function invocations.
Example Workflow Composition:
from flytekit import workflow @workflow def data_pipeline(raw_data_path: str) -> str: cleaned_data = preprocess_data(input_path=raw_data_path) features = feature_engineering(cleaned_data=cleaned_data) model_results = model_training(features=features) return model_results This usage pattern is pivotal: the workflow code appears as a pure Python function that invokes other annotated functions, yet FlyteKit internally rewrites this into a directed acyclic graph of task executions. This abstraction allows developers to programmatically manipulate workflows, adding conditionals, loops, or dynamic branching with standard Python control flow.
Flyte enforces clear boundaries between Flyte-specific and generic Python code. Tasks and workflows must be pure and type-annotated where Flyte semantics apply, while business logic, data I/O, and ancillary utility code remain unaltered Python modules imported as needed. This delineation guarantees portability and testability, enabling standard unit tests against task bodies without any Flyte runtime dependency.
Additionally, FlyteKit supports advanced input/output typing, including complex nested structures and FlyteLiteral types that optimize serialization and execution. Tasks may define multiple outputs to surface rich data artifacts, and workflows may aggregate and map over collections using language-native constructs like list comprehensions and dictionary comprehensions, promoting readability.
Multi-output Task:
from typing import Tuple @task def train_and_evaluate_model(training_data: str) -> Tuple[float, str]: ...