DSL Overview

Introduction to the Kubeflow Pipelines domain-specific language (DSL)

The Kubeflow Pipelines DSL is a set of Python libraries that you can use to specify machine learning (ML) workflows, including pipelines and their components. (If you’re new to pipelines, see the conceptual guides to pipelines and components.)

The DSL compiler compiles your Python DSL code into a single static configuration (YAML) that the Pipeline Service can process. The Pipeline Service, in turn, converts the static configuration into a set of Kubernetes resources for execution.

Installing the DSL

The DSL is part of the Kubeflow Pipelines software development kit (SDK), which includes the DSL as well as Python libraries to interact with the Kubeflow Pipeline APIs.

Follow the guide to installing the Kubeflow Pipelines SDK.

Introduction to main DSL functions and classes

This section introduces the DSL functions and classes that you use most often. You can see all classes and functions in the Kubeflow Pipelines DSL.

Pipelines

To create a pipeline, write your own pipeline function and use the DSL’s pipeline(name, description) function as a decorator.

Usage:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(a: PipelineParam, b: PipelineParam):
  ...

Note: The Pipeline() class is not useful for creating pipelines. Instead, you should define your pipeline function and decorate it with @kfp.dsl.pipeline as described above. The class is useful for getting a pipeline object and its operations when implementing a compiler.

Components

To create a component for your pipeline, write your own component function and use the DSL’s component(func) function as a decorator.

Usage:

@kfp.dsl.component
def my_component(my_param):
  ...
  return dsl.ContainerOp()

The above component decorator requires the function to return a ContainerOp instance. The main purpose of using this decorator is to enable DSL static type checking.

Pipeline parameters

The PipelineParam(object) class represents a data type that you can pass between pipeline components.

You can use a PipelineParam object as an argument in your pipeline function. The object is then a pipeline parameter that shows up in Kubeflow Pipelines UI. A PipelineParam can also represent an intermediate value that you pass between components.

Usage as an argument in a pipeline function:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(
    my_num = dsl.PipelineParam(name='num-foos', value=1000),
    my_name = dsl.PipelineParam(name='my-name', value='some text'),
    my_url = dsl.PipelineParam(name='foo-url', value='http://example.com')):
  ...

The DSL supports auto-conversion from string to PipelineParam. You can therefore write the same function like this:

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline(
    my_num='1000', 
    my_name='some text', 
    my_url='http://example.com'):
  ...

See more about PipelineParam objects in the guide to building a component.

Types

The types module contains a list of types defined by the Kubeflow Pipelines SDK. Types include basic types like String, Integer, Float, and Bool, as well as domain-specific types like GCPProjectID and GCRPath.

See the guide to DSL static type checking.