tatchiwiggers / recap-1209

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vertex AI Workbench

Let's explore the Vertex AI Workbench as an alternative to Compute Engine for model training.

Vertex AI Workbench provides managed virtual machines, allowing you to run ML code without having to precisely configure the environment for the code:

  • User-managed notebooks provide a customizable environment and allow you to specify package versions
  • Managed notebooks use custom containers, can be extended to read or write to BigQuery or cloud storage, and can be scheduled to run at set times

Create a Workbench Instance

Create a workbench instance:

  1. Access the Vertex AI Workbench page
  2. At the top, select the USER-MANAGED NOTEBOOKS tab and click on the blue CREATE NOTEBOOK button further below
  3. Give your notebook the following name: cloud-training-recap
  4. In the Environment section, set the operating system to Ubuntu 20.04
  5. Still in this section, select TensorFlow Enterprise 2.10 (without GPU) as the environment
  6. Scroll down and click on CREATE

👉 The workbench should be ready in a couple of minutes

Open the virtual machine

  • Click on OPEN JUPYTERLAB
  • Install gh for Ubuntu

Install zsh and oh-my-zsh

Install zsh:

sudo apt-get install zsh

Install oh-my-zsh:

sh -c "$(curl -fsSL https://raw.github.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Authenticate on GitHub 1/2

Go to the workbench instance and open the Terminal.

Run the gh auth login command:

  • Account: GitHub.com
  • Protocol: HTTPS
  • Authenticate Git with your GitHub credentials: Yes
  • Authentication method: Paste an authentication token

Create a GitHub Token

Create a GitHub token to allow the workbench to access your account:

  1. Access GitHub Tokens
  2. Click on generate new token
  3. Fill in the Note field with a meaningful name, such as Vertex AI Workbench token
  4. Check that these scopes are enabled: 'repo', 'read:org', 'workflow'
  5. Click on generate token
  6. Copy the token (you will not be able to retrieve it later)

Authenticate on GitHub 2/2

Paste the token in the Vertex AI instance's Terminal

Clone your project repo

Clone your recap repo inside your Workbench VM using Kitt-generated token provided on top of Kitt webpage.

# Create challenge folder
mkdir -p ~/code/lewagon/data-recap-cloud-training && cd $_

# Download challenge
curl -s -H "Authorization: Token <REPLACE_BY_KITT_MYRIAD_TOKEN>, User=$(gh api user --jq '.login')" "https://kitt.lewagon.com/camps/1002/challenges/setup_script?path=07-ML-Ops%2F02-Cloud-training%2FRecap" | bash
cp .env.sample .env
nano .env

Install direnv:

curl -sfL https://direnv.net/install.sh | bash
eval "$(direnv hook zsh)"
direnv allow .

Install package:

pip install -e .
make reset_local_files

Run the preprocessing and the training:

make run_preprocess run_train

New Workbench Terminal

Manually hook direnv:

eval "$(direnv hook zsh)"

Handling the .env in JupyterLab

The easiest solution is to manually define the environment variables from Python:

import os

os.environ["DATA_SIZE"] = "1k"
os.environ["MODEL_TARGET"] = "local"
os.environ["GCP_PROJECT_WAGON"] = wagon-public-datasets
os.environ["GCP_PROJECT"] = ... # Your personal GCP project for this bootcamp
os.environ["GCP_REGION"] = "europe-west1"
os.environ["CLOUD_STORAGE"] = "europe-west1"
os.environ["BQ_REGION"] = "EU"
os.environ["BQ_DATASET"] = "taxifare"
...

Compute Engine vs Vertex AI Workbench

In Compute Engine we can see that the Vertex AI Workbench uses a Compute Engine instance behind the scenes:

About


Languages

Language:Jupyter Notebook 64.1%Language:Python 32.9%Language:Makefile 3.0%