Semantic Sementation model within ML pipeline
This repository shows how to build a Machine Learning Pipeline for Semantic Segmentation task with TensorFlow Extended(TFX) and various GCP products such as Vertex Pipeline, Vertex Training, Vertex Endpoint. Also, the ML pipeline contains few custom components integrated with
HFModelPusher pushes trained model to
HFSpacePusher creates a
Gradio application with latest model out of the box.
NOTE: We use U-NET based TensorFlow model from the official tutorial. Since we implement an ML pipeline, U-NET like model could be a good starting point. Other SOTA models like SegFormer from
Transformers will be explored later.
NOTE: The aim of this project is not to serve a fully fine-tuned model. Our main focus is to demonstrate how to build ML pipeline for semantic segmentation task instead.
The TFX pipeline is designed to be run on both of local and GCP environments.
On local environment
$ cd training_pipeline $ tfx pipeline create --pipeline-path=local_runner.py \ --engine=local $ tfx pipeline compile --pipeline-path=local_runner.py \ --engine=local $ tfx run create --pipeline-name=segformer-training-pipeline \ --engine=local
On Vertex AI environment
There are two ways to run TFX pipeline on GCP environment(Vertex AI).
First, you can run it manually with the following CLIs. In this case, you should replace
GOOGLE_CLOUD_PROJECT to your GCP project ID in
$ cd training_pipeline $ tfx pipeline create --pipeline-path=kubeflow_runner.py \ --engine=vertex $ tfx pipeline compile --pipeline-path=kubeflow_runner.py \ --engine=vertex $ tfx run create --pipeline-name=segformer-training-pipeline \ --engine=vertex \ --project=$GCP_PROJECT_ID \ --regeion=$GCP_REGION
Or, you can use
workflow_dispatch feature of GitHub Action. In this case, go to the action tab, then select
Trigger Training Pipeline on the left pane, then
Run workflow on the branch of your choice. The GCP project ID in the input parameters will automatically replace the
training_pipeline/pipeline/configs.py. Also it will be injected to the
tfx run create CLI.
- Notebook to prepare input dataset in
- Upload the input dataset into the GCS bucket
- (Optional) Add additional Service Account Key as a GitHub Action Secret if collaborators want to run ML pipeline on the GCP with their own GCP account. Each word of the secret key should be separated with underscore. For example,
- Modify modeling part to train TensorFlow based Unet model.
- Modify Gradio app part. Initial version is copied from segformer-tf-transformers repository.
- Modify Pipeline part. May need to remove some optional components such as
- Modify configs.py to reflect changes.
- (Optional) Add custom TFX component to dynamically inject hyperameters to search with
We are thankful to the ML Developer Programs team at Google that provided GCP support.