jhammarstedt / MLOps-Kubeflow_in_GCP

Small Kubeflow pipeline in GCP with CI&CD components

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MLOPS: CI&CD with Kubeflow Pipelines in GCP

This repo will demonstrate how to take the first step towards MLOps by setting up and deploying a simple ML CI/CD pipeline using Google Clouds AI Platform, Kubeflow and Docker.

✍ Authors

πŸ—Ί Overview

The following topics will be covered:

  1. Building each task as a docker container and running them with cloud build
    • Preprocessing step: Loading data from GC bucket, editing it and storing a new file
    • Training: Creating a pytorch model and build a custom prediction routine (GCP mainly supporst tensorflow, but you can add custom models)
    • Deployment: Deploying your custom model to the AI Platform with version control
  2. Creating a Kubeflow pipeline and connecting the above tasks
  3. Perform CI by building Github Triggers in Cloud Build that will rebuild container upon a push to repository
  4. CD by using Cloud Functions to trigger upon uploading new data to your bucket

gcloud_meme

πŸ“½ Video Demo

There's a short video demo of the project available here.

Note that it was created for a DevOps course at KTH with a 3 minute limit and is therefore very breif and compressed to fit these requirements.

πŸŒ‰ Setting up the pipeline

Here we will go through the process of running the pipeline step by step: (Note at the moment there are some hard coded project names/repos etc that you might want to change, this will be updated here eventually)

  1. Create a gcp project, open the shell (make sure you're in the project), and clone the repository

    $ git clone https://github.com/jhammarstedt/gcloud_MLOPS_demo.git

  2. Create a kubeflow pipeline

  3. Run the $ ./scripts/set_auth.sh script in google cloud shell (might wanna change the SA_NAME), this gives us the roles we need to run the pipeline

  4. Create a project bucket and a data bucket (used for CD later), here we named just it {PROJECT_NAME}_bucket and {PROJECT_NAME}-data-bucket

  • In the general project bucket add following subfolders: models, packages,data
  1. Locally, create a package from the models directory in the containers/train folder by running: $ python containers/train/models/setup.py sdist , this creates a package with pytorch and the model structure, just drag and drop it to the package subfolder.

  2. Create a docker container for each step (each of the folders in the containers repo representes a different step) * Do this by running: $ gcloud_MLOPS_demo/containers ./build_containers.sh from the cloud shell.

    This will run "build_single_container.sh in each directory"

    • If you wish to try and just build one container, enter the directory which you want to build and run:

      $ bash ../build_single_container.sh {directory name}

  3. Each subfolder (which will be a container) includes:

    • A cloudbuild.yaml file (created in build_single_repo.sh) which will let Cloud Build create a docker container by running the included Dockerfile.

    • The DockerFile that mainly runs the task script (e.g deploy.sh)

    • A task script that tells the Docker container what to do (e.g preproc/train/deploy the trained model to the AI-platform)

  4. To test the container manually run

    $ docker run -t gcr.io/{YOUR_PROJECT}/{IMAGE}:latest --project {YOUR_PROJECT} --bucket {YOUR_BUCKET} local

    e.g to run the container that deploys the model to AI platform I would run:

    $ docker run -t gcr.io/ml-pipeline-309409/ml-demo-deploy-toai

  5. Create a pipeline in python using the kubeflow API (currently a notebook in AI platform)

  6. Now we can either run the pipeline manually at the pipeline dashbord from 1. or run it as a script.

πŸ›  CI

To set up CI and rebuild at every push:

  • Connect gcloud to github, either in the Trigger UI or run: $ ./scripts setup_trigger.sh
  • Push the newly created cloudbuilds from GCP into the origin otherwise the trigger won't find them
  • This trigger will run everytime a push to master happens in any of the containers and thus rebuild the affected Docker Image

πŸ“¦ CD

CD can be necessary when we want to retrain/finetune the model give that we get new data, not every time we update a component. So we will have a Cloud function that will trigger a training pipeline when we upload new data to the Cloud Storage.

  1. Get the pipeline host url from pipeline settings (looks like this:, ideally save it as an PIPELINE_HOST environment variable).

  2. in pipeline folder, run the deploy script

    $ ./deploy_cloudfunction $PIPELINE_HOST

  3. Now, whenever a new file is added or deleted from the project bucket, it will rerun the pipeline.

πŸ‘“ Resources used and further reading

About

Small Kubeflow pipeline in GCP with CI&CD components


Languages

Language:Python 48.6%Language:Shell 43.9%Language:Dockerfile 7.6%