Index of Reusable Components
A Kubeflow Pipelines component is a self-contained set of code that performs one step in the pipeline, such as data preprocessing, data transformation, model training, and so on. Each component is packaged as a Docker image. You can add existing components to your pipeline. These may be components that you create yourself, or that someone else has created and made available.
The Kubeflow Pipelines repository includes a variety of reusable components that you can add to your pipeline. This page highlights the components that include usage documentation in the form of README files.
Cloud Machine Learning (ML) Engine
The following components submit jobs to Cloud ML Engine on Google Cloud Platform (GCP).
- Cloud ML Engine model training
- Submits a Python training job to
Cloud ML Engine.
The job writes the trained model and other training results to a
Cloud Storage location of your
choice.
Component output: the ID of the training job on Cloud ML Engine.
- Cloud ML Engine model deployment
- Deploys a trained model to
Cloud ML Engine
from a Cloud Storage path.
Component output: the Cloud ML Engine resource name of the deployed model version.
- Cloud ML Engine batch prediction
- Submits a batch prediction request to a trained model deployed on
Cloud ML Engine.
The job writes the prediction results to a
Cloud Storage location of your
choice.
Component output: the ID of the batch prediction job on Cloud ML Engine.
BigQuery
The following components submits a job to BigQuery on GCP.
- BigQuery query
- Submits a query to BigQuery
and writes the query results to a
Cloud Storage location of your
choice.
Component output: the Cloud Storage blob path where the query results are located.
Cloud Dataflow
The following components submit jobs to Cloud Dataflow on GCP.
- Dataflow Python Apache Beam job
- Submits an Apache Beam job authored in Python to
Cloud Dataflow.
The Cloud Dataflow pipeline runner executes the Python code.
Component output: the ID of the Dataflow job.
- Dataflow job from template
- Submits a job to
Cloud Dataflow based on a template.
The template must be stored in
Cloud Storage.
Component output: the ID of the Dataflow job.
More information
- For usage instructions for each of the above components, see the README file of the linked component on GitHub.
- See how to build your own reusable components.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.