Pipeline

Conceptual overview of pipelines in Kubeflow Pipelines

A pipeline declares the logical structure for executing components together as a machine learning (ML) workflow in a Kubernetes cluster; this includes defining execution order and conditions, as well as configuring parameter passing and data flow. Pipelines are organized as directed graphs that progress through individual steps (defined by components).

When a pipeline is executed through a run, the Kubeflow Pipelines backend converts the pipeline into instructions to launch Kubernetes Pods (and other resources) to carry out the steps (components) in the workflow (pipeline). The Pods start containers, which in turn run their respective component code. The Kubeflow Pipelines backend manages the technical details of coordinating data passing and control flow at the container level.

The Why Behind Kubeflow Pipelines

The work of an AI/ML engineer requires structured experimentation and iteration of workflows with complex data processing and model preparation. These workflows carry the expectation of on-demand specialized resources that are not only available, but that can coordinate closely towards optimized ML outcomes. Iteration needs are near-term (for example, during exploratory analysis and model development), as well as long-term (for example, to correct data drift, or to add a new ML feature).

Kubeflow Pipelines enable AI/ML engineers to define the structure of their workflows using Python, for pipelines that are executed on a Kubernetes cluster. Therefore, KFP combines the power of Python for experimentation and development, with the power of Kubernetes for resources and execution. This can translate to accelerated production-level AI/ML development, and ultimately to better AI/ML product outcomes. Some benefits include:

Structured workflow management: organize and structure ML workflows, ensuring clarity and maintainability
ML experimentation and iteration: enable modification and quick iterations while ensuring repeatable and consistent structure
Reproducibility and versioning: support for versioning and run history
Simplified execution: abstract away Kubernetes commands and YAML specifications, accessing lower levels when needed
Optimization and efficiency: support caching, parallel execution, retries, and exit handling
Enterprise features: support on-demand scaling, metadata and artifact tracking, and pipeline sharing
ML-first SDK and platform design support ML experimentation, development, and production needs

What Does a Pipeline Declaration Consist Of?

A KFP pipeline consists of the following key elements:

1. Input Parameters

name, description, display_name - pipeline-level metadata parameters
pipeline_root - object storage root path for metadata and artifact persistence (see Pipeline Root)
functional parameters - parameters to be passed as inputs to component steps at runtime

2. Component Tasks

component tasks - define the individual pipeline steps that will run in containers

3. Control Logic

flow logic - sequential, parallel (see Control Flow)
conditional logic - if, elif, else (see Control Flow/Conditions)
parameter passing - support for small values of types like strings, numbers, lists, dicts, bool (see Pass small amounts of data between components)
artifact inputs and outputs - datasets, models, markdown, HTML, metrics (see Create, use, pass, and track ML artifacts)
exit handling - handles exit tasks, even if upstream tasks fail (see Control Flow/Exit Handling)

4. Runtime Logic

caching - cached outputs can be used per task (see Caching)
retries - set retries for tasks
resource requests and limits - request memory, cpu, GPU
node selectors - request task containers to run on specific nodes
environment variables - pass environment variables to task containers

Basic Pipeline Construction

The recommended way to define a pipeline is using the KFP Python SDK.

Steps:

Define individual components:

# my_components.py
from kfp.dsl import component

@component()
def add(a: float, b: float) -> float:
    '''add two numbers'''
    return a + b    

@component()
def multiply(a: float, b: float) -> float:
    '''multiply two numbers'''
    return a * b

Compose components in a pipeline:

# my_pipeline.py
import my_components
from kfp.dsl import pipeline

@pipeline()
def arithmetic_pipeline(a: float, b: float):
    step1 = my_components.add(a, b)
    step2 = my_component.multiply(step1.output, 2.0)


if __name__ == "__main__":
    from kfp.compiler import Compiler

    Compiler().compile(
        pipeline_func=arithmetic_pipeline, 
        package_path="my_pipeline.yaml"
    )

Step 1 defines two basic components, which are saved in my_components.py. Step 2 defines a simple pipeline that takes float inputs a and b, adds them in the add component, and then gets the product (multiply component) of the output from the add step with the number 2.0.

A few things to note from the above construction are:

Since step2 depends on the output of step1, the KFP backend will know to run the steps sequentially (when no dependency exists, steps run in parallel by default).
Steps step1 and step2 will run in independent containers, and KFP will take care of transferring the needed parameter.
The pipeline function, arithmetic_pipeline, is not run directly by the user, but rather compiled to IR YAML. Running the pipeline function will not run the steps. The compiler translates the pipeline and component Python instructions into instructions the KFP backend can understand.

The KFP Python SDK client (kfp.client.Client) supports submitting pipeline runs to the KFP backend either directly (create_run_from_pipeline_func), or from the compiled pipeline YAML file (create_run_from_pipeline_package). The KFP UI also supports running pipelines once their YAML files have been uploaded (either from the GUI or Python client interface).

The KFP documentation includes more detailed examples exploring pipeline implementation and runs. See the Getting Started Guide to quickly test out running a pipeline on a KFP cluster. For a more in-depth treatment of pipeline implementation, visit the User Guide section of the documentation, including sections on Core Functions and Data Handling.

Next steps

Read an overview of Kubeflow Pipelines.
Follow the pipelines quickstart guide to deploy Kubeflow and run a sample pipeline directly from the Kubeflow Pipelines UI.

Feedback

Was this page helpful?

Thank you for your feedback!

We're sorry this page wasn't helpful. If you have a moment, please share your feedback so we can improve.

Last modified June 17, 2025: pipelines: Update pipeline concept docs (#4074) (f3d6b27d)