Index of Reusable Components

A list of Kubeflow Pipelines components that you can use in your pipelines

A Kubeflow Pipelines component is a self-contained set of code that performs one step in the pipeline, such as data preprocessing, data transformation, model training, and so on. Each component is packaged as a Docker image. You can add existing components to your pipeline. These may be components that you create yourself, or that someone else has created and made available.

The Kubeflow Pipelines repository includes a variety of reusable components that you can add to your pipeline. This page highlights the components that include usage documentation in the form of README files.

Cloud Machine Learning (ML) Engine

The following components submit jobs to Cloud ML Engine on Google Cloud Platform (GCP).

Cloud ML Engine model training: Submits a Python training job to Cloud ML Engine. The job writes the trained model and other training results to a Cloud Storage location of your choice.
Component output: the ID of the training job on Cloud ML Engine.
Cloud ML Engine model deployment: Deploys a trained model to Cloud ML Engine from a Cloud Storage path.
Component output: the Cloud ML Engine resource name of the deployed model version.
Cloud ML Engine batch prediction: Submits a batch prediction request to a trained model deployed on Cloud ML Engine. The job writes the prediction results to a Cloud Storage location of your choice.
Component output: the ID of the batch prediction job on Cloud ML Engine.

BigQuery

The following components submits a job to BigQuery on GCP.

BigQuery query: Submits a query to BigQuery and writes the query results to a Cloud Storage location of your choice.
Component output: the Cloud Storage blob path where the query results are located.

Cloud Dataflow

The following components submit jobs to Cloud Dataflow on GCP.

Dataflow Python Apache Beam job: Submits an Apache Beam job authored in Python to Cloud Dataflow. The Cloud Dataflow pipeline runner executes the Python code.
Component output: the ID of the Dataflow job.
Dataflow job from template: Submits a job to Cloud Dataflow based on a template. The template must be stored in Cloud Storage.
Component output: the ID of the Dataflow job.

More information

For usage instructions for each of the above components, see the README file of the linked component on GitHub.
See how to build your own reusable components.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 13.04.2019: Linked up various new pages (#624) (4cec27aa)

You are viewing documentation for Kubeflow 0.5