ModelDB
Introduction
ModelDB is an end-to-end system to manage machine learning models. It ingests models and associated metadata as models are being trained, stores model data in a structured format, and surfaces it through a web-frontend for rich querying. ModelDB can be used with any ML environment via the ModelDB Light API. ModelDB native clients can be used for advanced support in spark.ml and scikit-learn.
For more info see here.
Deploying ModelDB
Use the below commands to deploy ModelDB.
ks generate modeldb modeldb
ks apply default -c modeldb
Concepts
ModelDB organizes model data in a 3-level model hierarchy, from bottom to top -
- ExperimentRun: every execution of a script/program creates an ExperimentRun.
- Experiment: related ExperimentRuns can be grouped into an Experiment (e.g., “running hyperparameter optimization for the Neural Network”).
- Project: Finally, all Experiments and ExperimentRuns belong to a Project (e.g., “churn prediction”).
Classes -
- Datasets takes filepaths and optional metadata. Associate a tag (key) for each Dataset (value).
- Model takes model type, model and path to model as arguments.
- ModelConfig takes model type and model config.
- ModelMetrics takes what metric to use as argument.
Using ModelDB
After ModelDB is deployed and modeldb-db, modeldb-backend and modeldb-frontend pods are running -
-
Install ModelDB
Modeldb is now a part of the verta library. verta is compatible with python 3.5+ and the latest verta releases are available as source packages over pip. When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state.
-
Check your python version :
python --version
-
Creating and activating new environment :
python -m venv .env
source .env/bin/activate
-
Install Verta :
pip install verta==versionNumber
-
Setup
Get the host and port details of the modelDB backend proxy.
kubectl get service modeldb-backend-proxy --namespace kubeflow
Configure HOST and PORT to connect to the modelDB backend.
from verta import ModelDBClient HOST = "" PORT = "" client = ModelDBClient(HOST, PORT)
-
Creating a project
Begin by creating a project and adding all the models as runs within the project. Each run can represent a strategy to solve the problem.
project = client.set_project(proj_name="My Project") # a project is a goal experiment = client.set_experiment(expt_name="My Experiment") # strategy for project run = client.set_experiment_run(run_name="First run")
-
Logging hyperparameters, metrics and datasets
Use
run.log_xxx()
in your code to record metrics, hyperparameters, datasets etc.#Hyperparameters param_grid = {'n_estimators': [100], 'learning_rate':[ 0.1, 0.02], 'max_depth' : [6, 4], 'max_leaf_nodes': [3, 15], 'max_features': [1.0, 0.1] } for h, v in param_grid.items(): run.log_hyperparameter(h, v) #Metrics model = GradientBoostingRegressor(**hyperparameters) model.fit(X_train, y_train) y_pred = model.predict(X_test) train_score = model.score(X_train, y_train) test_score = model.score(X_test, y_test) run.log_metric("Accuracy_train", train_score) run.log_metric("Accuracy_test", test_score) #Datasets #save models with either joblib or pickle from sklearn.externals import joblib filename_2 = "simple_model_gbr_2.joblib" joblib.dump(model, filename_2) run.log_model("model_gbr_2", filename_2)
-
View your models in the webapp
Get the IP address of the modelDB webapp service and open it in the browser
kubectl get service modeldb-webapp --namespace kubeflow
Samples
These notebooks show how each dataset, model, model configuration, and model metrics can be initialized and logged into modelDB -
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.