GCP_Kubeflow_example
Example of generic deep learning application deployed on Kubeflow on GCP.
In this case study an italian COVID-19 open dataset is used to perform prediction on virus spread. Also a Kubeflow pipeline will be created to host the predictions and monitor the results.
Kubeflow Cluster creation (for DevOps)
See here
-
Connect to: Kubeflow deploy tool
-
Create Kubeflow cluster noting
PROJECT_ID
,ZONE
andDEPLOYMENT_NAME
. This tool will create Kubernetes deployment, services, ingress, etc. -
If it works from local, launch
install_gcloud.sh
script to setup Google Cloud CLI, otherwise openGcloud Shell
. -
Create a new bucket, install
kfp
and enable services:
export DEPLOYMENT_NAME=kf-test
export PROJECT_ID=<PROJECT_NAME>
export ZONE=europe-west1-d
gcloud config set project ${PROJECT_ID}
gcloud config set compute/zone ${ZONE}
export BUCKET_NAME=${PROJECT_ID}-kubeflow
gsutil mb gs://${BUCKET_NAME}
sudo pip3 install -U kfp
gcloud services enable cloudresourcemanager.googleapis.com iam.googleapis.com file.googleapis.com ml.googleapis.com
- Connect to Kubeflow cluster:
gcloud container clusters get-credentials ${DEPLOYMENT_NAME} --project ${PROJECT_ID} --zone ${ZONE}
kubectl config set-context $(kubectl config current-context) --namespace=kubeflow
- If not already present, create GPU node-pool:
kubectl create clusterrolebinding sa-admin --clusterrole=cluster-admin --serviceaccount=kubeflow:pipeline-runner
gcloud container node-pools create gpu-pool \
--cluster=${DEPLOYMENT_NAME} \
--zone ${ZONE} \
--num-nodes=1 \
--machine-type n1-highmem-8 \
--scopes cloud-platform --verbosity error \
--accelerator=type=nvidia-tesla-k80,count=1
Kubeflow pipeline creation (for Developers)
-
Authenticate into Google Cloud Docker Registry:
gcloud auth configure-docker
. -
For each component (train, deploy, etc.) use the
build_image.sh
scripts to build components:
chmod +x ./build_image.sh.sh
./build_image.sh.sh
this will build (reusable) and push components containers.
To build components directly it is possible to use the dsl.ContainerOp
object (see here) this is easier but those components are not reusable.
See examples here.
- For the deploy step run the
routine/build_routine.sh
script to build custom model execution script:
chmod +x ./routine/build_routine.sh
- Use the Jupyter notebook
pipeline_creation.ipynb
into Kubeflow environment to build pipeline with given components.
NOTE: there is an issue with the "component upload" step of the pipeline_creation.ipynb
workflow: sometimes updates on components YAML definition require time to be effective. This is due to some kind of cache in Kubeflow or GCS storage.
Invoke model on AI Platform
Use the script in manual_build.ipynb
to invoke the endpoints (with bash or Python).
Note: it is possible to depoy tensorflow, scikit-learn, or xgboost models. For any other custom prediction use "Custom prediction Routines" (see docs)
Next steps:
- Fix problem with unloaded metrics and tensorboard (see Kubeflow output docs)
- Add validation step
- Add CI/CD pipeline to build Kubeflow components