Version v0.5 of the documentation is no longer actively maintained. The site that you are currently viewing is an archived snapshot. For up-to-date documentation, see the latest version.

TensorFlow Serving

Serving TensorFlow models

Serving a model

We treat each deployed model as two components in your APP: one tf-serving-deployment, and one tf-serving-service. We can think of the service as a model, and the deployment as the version of the model.

Generate the service(model) component

ks generate tf-serving-service mnist-service
ks param set mnist-service modelName mnist    // match your deployment mode name
ks param set mnist-service trafficRule v1:100    // optional, it's the default value
ks param set mnist-service serviceType LoadBalancer    // optional, change type to LoadBalancer to expose external IP

Generate the deployment(version) component

ks generate tf-serving-deployment-gcp ${MODEL_COMPONENT}
ks param set ${MODEL_COMPONENT} modelName mnist
ks param set ${MODEL_COMPONENT} versionName v1   // optional, it's the default value
ks param set ${MODEL_COMPONENT} modelBasePath gs://kubeflow-examples-data/mnist
ks param set ${MODEL_COMPONENT} gcpCredentialSecretName user-gcp-sa
ks param set ${MODEL_COMPONENT} injectIstio true   // If you want to use istio

We enable TF Serving’s REST API, and it’s able to serve HTTP requests. The API is the same as our http proxy before.

Pointing to the model

Depending where model file is located, set correct parameters

Google cloud

Set the param as above section.

We need a service account that can access the model. If you are using Kubeflow’s click-to-deploy app, there should be already a secret, user-gcp-sa, in the cluster.

The model at gs://kubeflow-examples-data/mnist is publicly accessible. However, if your environment doesn’t have google cloud credential setup, TF serving will not be able to read the model. See this issue for example. To setup the google cloud credential, you should either have the environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to the credential file, or run gcloud auth login. See doc for more detail.


To use S3, generate a different prototype

ks generate tf-serving-deployment-aws ${MODEL_COMPONENT} --name=${MODEL_NAME}

First you need to create secret that will contain access credentials. Use base64 to encode your credentials and check details in the Kubernetes guide to creating a secret manually

apiVersion: v1
  name: secretname
kind: Secret

Enable S3, set url and point to correct Secret

ks param set ${MODEL_COMPONENT} modelBasePath ${MODEL_PATH}
ks param set ${MODEL_COMPONENT} s3Enable true
ks param set ${MODEL_COMPONENT} s3SecretName secretname

Optionally you can also override default parameters of S3

# S3 region
ks param set ${MODEL_COMPONENT} s3AwsRegion us-west-1

# Whether or not to use https for S3 connections
ks param set ${MODEL_COMPONENT} s3UseHttps true

# Whether or not to verify https certificates for S3 connections
ks param set ${MODEL_COMPONENT} s3VerifySsl true

# URL for your s3-compatible endpoint.
ks param set ${MODEL_COMPONENT} s3Endpoint

Using GPU

To serve a model with GPU, first make sure your Kubernetes cluster has a GPU node. Then set an additional param:

ks param set ${MODEL_COMPONENT} numGpus 1

There is an example for serving an object detection model with GPU.


export KF_ENV=default
ks apply ${KF_ENV} -c mnist-service
ks apply ${KF_ENV} -c ${MODEL_COMPONENT}

The KF_ENV environment variable represents a conceptual deployment environment such as development, test, staging, or production, as defined by ksonnet. For this example, we use the default environment. You can read more about Kubeflow’s use of ksonnet in the Kubeflow ksonnet component guide.

Sending prediction request directly

If the service type is LoadBalancer, it will have its own accessible external ip. Get the external ip by:

kubectl get svc mnist-service

And then send the request

curl -X POST -d @input.json http://EXTERNAL_IP:8500/v1/models/mnist:predict

Sending prediction request through ingress and IAP

If the service type is ClusterIP, you can access through ingress. It’s protected and only one with right credentials can access the endpoint. Below shows how to programmatically authenticate a service account to access IAP.

  1. Save the client ID that you used to deploy Kubeflow as IAP_CLIENT_ID.
  2. Create a service account gcloud iam service-accounts create --project=$PROJECT $SERVICE_ACCOUNT
  3. Grant the service account access to IAP enabled resources: gcloud projects add-iam-policy-binding $PROJECT \ --role roles/iap.httpsResourceAccessor \ --member serviceAccount:$SERVICE_ACCOUNT
  4. Download the service account key: gcloud iam service-accounts keys create ${KEY_FILE} \ --iam-account ${SERVICE_ACCOUNT}@${PROJECT}
  5. Export the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the key file of the service account.

Finally, you can send the request with this python script

python https://YOUR_HOST/tfserving/models/mnist IAP_CLIENT_ID --input=YOUR_INPUT_FILE

Telemetry and Rolling out model using Istio

Please look at the Istio guide.

Logs and metrics with Stackdriver

See the guide to logging and monitoring for instructions on getting logs and metrics using Stackdriver.

Last modified 28.04.2019: Resolved typos (#665) (44f20b5)