Version v0.5 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.

Index of Reusable Components

A list of Kubeflow Pipelines components that you can use in your pipelines

A Kubeflow Pipelines component is a self-contained set of code that performs one step in the pipeline, such as data preprocessing, data transformation, model training, and so on. Each component is packaged as a Docker image. You can add existing components to your pipeline. These may be components that you create yourself, or that someone else has created and made available.

The Kubeflow Pipelines repository includes a variety of reusable components that you can add to your pipeline. This page highlights the components that include usage documentation in the form of README files.

Cloud Machine Learning (ML) Engine

The following components submit jobs to Cloud ML Engine on Google Cloud Platform (GCP).

Cloud ML Engine model training
Submits a Python training job to Cloud ML Engine. The job writes the trained model and other training results to a Cloud Storage location of your choice.

Component output: the ID of the training job on Cloud ML Engine.

Cloud ML Engine model deployment
Deploys a trained model to Cloud ML Engine from a Cloud Storage path.

Component output: the Cloud ML Engine resource name of the deployed model version.

Cloud ML Engine batch prediction
Submits a batch prediction request to a trained model deployed on Cloud ML Engine. The job writes the prediction results to a Cloud Storage location of your choice.

Component output: the ID of the batch prediction job on Cloud ML Engine.

BigQuery

The following components submits a job to BigQuery on GCP.

BigQuery query
Submits a query to BigQuery and writes the query results to a Cloud Storage location of your choice.

Component output: the Cloud Storage blob path where the query results are located.

Cloud Dataflow

The following components submit jobs to Cloud Dataflow on GCP.

Dataflow Python Apache Beam job
Submits an Apache Beam job authored in Python to Cloud Dataflow. The Cloud Dataflow pipeline runner executes the Python code.

Component output: the ID of the Dataflow job.

Dataflow job from template
Submits a job to Cloud Dataflow based on a template. The template must be stored in Cloud Storage.

Component output: the ID of the Dataflow job.

More information