docker gcp google-cloud google-cloud-platform mlops pipeline

MLOPS: CI&CD with Kubeflow Pipelines in GCP

This repo will demonstrate how to take the first step towards MLOps by setting up and deploying a simple ML CI/CD pipeline using Google Clouds AI Platform, Kubeflow and Docker.

✍ Authors

Johan Hammarstedt, jhammarstedt
Matej Sestak, Sestys

🗺 Overview

The following topics will be covered:

Building each task as a docker container and running them with cloud build
- Preprocessing step: Loading data from GC bucket, editing it and storing a new file
- Training: Creating a pytorch model and build a custom prediction routine (GCP mainly supporst tensorflow, but you can add custom models)
- Deployment: Deploying your custom model to the AI Platform with version control
Creating a Kubeflow pipeline and connecting the above tasks
Perform CI by building Github Triggers in Cloud Build that will rebuild container upon a push to repository
CD by using Cloud Functions to trigger upon uploading new data to your bucket

📽 Video Demo

There's a short video demo of the project available here.

Note that it was created for a DevOps course at KTH with a 3 minute limit and is therefore very breif and compressed to fit these requirements.

🌉 Setting up the pipeline

Here we will go through the process of running the pipeline step by step: (Note at the moment there are some hard coded project names/repos etc that you might want to change, this will be updated here eventually)

Create a gcp project, open the shell (make sure you're in the project), and clone the repository

$ git clone https://github.com/jhammarstedt/gcloud_MLOPS_demo.git
Create a kubeflow pipeline
Run the $ ./scripts/set_auth.sh script in google cloud shell (might wanna change the SA_NAME), this gives us the roles we need to run the pipeline
Create a project bucket and a data bucket (used for CD later), here we named just it {PROJECT_NAME}_bucket and {PROJECT_NAME}-data-bucket

In the general project bucket add following subfolders: models, packages,data

Locally, create a package from the models directory in the containers/train folder by running: $ python containers/train/models/setup.py sdist , this creates a package with pytorch and the model structure, just drag and drop it to the package subfolder.
Create a docker container for each step (each of the folders in the containers repo representes a different step) * Do this by running: $ gcloud_MLOPS_demo/containers ./build_containers.sh from the cloud shell.

This will run "build_single_container.sh in each directory"
- If you wish to try and just build one container, enter the directory which you want to build and run:
  
  $ bash ../build_single_container.sh {directory name}
Each subfolder (which will be a container) includes:
- A cloudbuild.yaml file (created in build_single_repo.sh) which will let Cloud Build create a docker container by running the included Dockerfile.
- The DockerFile that mainly runs the task script (e.g deploy.sh)
- A task script that tells the Docker container what to do (e.g preproc/train/deploy the trained model to the AI-platform)
To test the container manually run

$ docker run -t gcr.io/{YOUR_PROJECT}/{IMAGE}:latest --project {YOUR_PROJECT} --bucket {YOUR_BUCKET} local

e.g to run the container that deploys the model to AI platform I would run:

$ docker run -t gcr.io/ml-pipeline-309409/ml-demo-deploy-toai
Create a pipeline in python using the kubeflow API (currently a notebook in AI platform)
Now we can either run the pipeline manually at the pipeline dashbord from 1. or run it as a script.

🛠 CI

To set up CI and rebuild at every push:

Connect gcloud to github, either in the Trigger UI or run: $ ./scripts setup_trigger.sh
Push the newly created cloudbuilds from GCP into the origin otherwise the trigger won't find them
This trigger will run everytime a push to master happens in any of the containers and thus rebuild the affected Docker Image

📦 CD

CD can be necessary when we want to retrain/finetune the model give that we get new data, not every time we update a component. So we will have a Cloud function that will trigger a training pipeline when we upload new data to the Cloud Storage.

Get the pipeline host url from pipeline settings (looks like this:, ideally save it as an PIPELINE_HOST environment variable).
in pipeline folder, run the deploy script

$ ./deploy_cloudfunction $PIPELINE_HOST
Now, whenever a new file is added or deleted from the project bucket, it will rerun the pipeline.

👓 Resources used and further reading

About

Small Kubeflow pipeline in GCP with CI&CD components

docker gcp google-cloud google-cloud-platform mlops pipeline

Languages

Language:Python 48.6%Language:Shell 43.9%Language:Dockerfile 7.6%