This repository contains two notebooks to demonstrate how to automate to produce a new AutoML model when the new dataset comes in. This project uses Vertex AI
in general, Vertex Managed Dataset
, Vertex Pipeline
, Vertex AutoML
, Cloud Storage
, and Cloud Function
in Google Cloud Platform.
There are two notebooks for this project. Everything can be setup by running each cell in the notebooks. The only thing you need to do manually is to setup IAMs.
-
For
Vertex Pipeline
, we needVertex Admin
,Cloud Storage Viewer
,Cloud Storage Editor
permissions(Some ML components need to access the managed dataset, and the Pipeline itself is stored in GCS(Google Cloud Storage) bukcet, so we need the listed permissions.). This can be setup undercompute
service account sinceVertex Pipeline
uses compute engines to run each component of the ML Pipeline. Also don't forget to enablecompute service account
and Vertex AI API. -
For
Cloud Function
, we needVertex Admin
andCloud Build
enabled. Since the docker image that theCloud Function
bases should be built byCloud Build
, we needCloud Build API
enabled. Also,Cloud Funcion
will trigger theVertex Pipeline
, so we needVertex Admin
- cifar-10-vertex-autoML-pipeline
- This notebook should be run before the second notebook. It will prepare the
Kubeflow Pipeline
with two additional custom components which determines if there is existing dataset or not. The entire notebook produces the pipeline specjson
file and put it in the GCS bucket.
- prepare-cifar10-subset
- This notebook creates two subsets of CIFAR10 dataset for simulating purpose. It also provides the codebase for
Cloud Function
, and you can directly deploy it within the notebook. Lastly, it tries to simulate the continuous adaptation scenario by putting each subset of data sequentially.