Lightweight MLOps Zoomcamp

This is stripped-down version of MLOps Zoomcamp

About the instructor:

Founder of DataTalks.Club - community of 40k+ data enthusiasts
Author of ML Bookcamp
Instructor of ML Zoomcamp and MLOps Zoomcamp
Connect with me on LinkedIn and Twitter
Do you want to run a similar workshop at your company? Write me at alexey@datatalks.club

In this workshop, we will:

Track experiments
Create a training pipeline
Register our model in the registry
Serve the model
Monitor the performance

In MLOps Zoomcamp we show how to use specific tools for achieving this. But in this workshop we'll focus more on the concepts. We'll use one tool - ML flow, but the principles will apply to any tool.

Plan

5 min: Discuss what's MLOps and how it helps with the entire ML project lifecycle
10 min: Prepare the environment and train our model (ride duration prediction)
10 min: Install and run MLFlow for experiment tracking
10 min: Use Scikit-Learn pipelines to make model management simpler
10 min: Convert a notebook for training a model to a Python script
15 min: Save and load the model with MLFlow model registry (and without)
15 min: Serve the model as a web service
10 min: Monitor the predictive performance of this model
5 min: Summary & wrapping up

What's MLOps

Poll: What's MLOps?

https://datatalks.club/blog/mlops-10-minutes.html

Preparation

We'll start with the model we already trained
Copy this notebook to "duration-prediction.ipynb"
This model is used for preducting the duration of a taxi trip

You can use any environment for running the content. In the workshop, we rented an EC2 instance on AWS:

Name: "mlops-workshop-2023"
Ubuntu 22.04 64 bit
Instance type: t2.xlarge
30 gb disk space
Give it an IAM role with S3 read/write access
We will need to forward ports 8888 and 5000

Script for preparing the instance:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
rm Miniconda3-latest-Linux-x86_64.sh

# add it to .bashrc
export PATH="$HOME/miniconda3/bin:$PATH"

sudo apt install jq

We'll start with preparing the environement for the workshop

pip install pipenv

Create the env in a separate folder (e.g. "train"):

pipenv --python=3.11

Install the dependencies

pipenv install scikit-learn==1.3.1 pandas pyarrow seaborn
pipenv install --dev jupyter

On Linux you might also need to instal pexpect for jupyter:

pipenv install --dev jupyter pexpect

Run poll: "Which virtual environment managers have you used"

Options:

Conda
Python venv
Pipenv
Poetry
Other
Didn't use any

We will use the data from the NYC TLC website:

Download the starter notebook:

wget https://raw.githubusercontent.com/alexeygrigorev/lightweight-mlops-zoomcamp/main/train/duration-prediction-starter.ipynb
mv duration-prediction-starter.ipynb duration-prediction.ipynb

Run the notebook

pipenv run jupyter notebook

Experiment tracking

First, let's add mlflow for tracking experiments

pipenv install mlflow==2.7.1 boto3

Run MLFlow locally (replace it with your bucket name)

pipenv run mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root s3://mlflow-models-alexey

Open it at http://localhost:5000/

Connect to the server from the notebook

import mlflow

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("nyc-taxi-experiment")

Log the experiment:

with mlflow.start_run():
    categorical = ['PULocationID', 'DOLocationID']
    numerical = ['trip_distance']

    train_dicts = df_train[categorical + numerical].to_dict(orient='records')
    val_dicts = df_val[categorical + numerical].to_dict(orient='records')

    mlflow.log_params({
        'categorical': categorical,
        'numerical': numerical,
    })
    
    dv = DictVectorizer()
    X_train = dv.fit_transform(train_dicts)
    X_val = dv.transform(val_dicts)
    
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_val)

    rmse = mean_squared_error(y_val, y_pred, squared=False)
    print(rmse)
    mlflow.log_metric('rmse', rmse)
    
    with open('dict_vectorizer.bin', 'wb') as f_out:
        pickle.dump(dv, f_out)
    mlflow.log_artifact('dict_vectorizer.bin')

    mlflow.sklearn.log_model(lr, 'model')

Replace it with a pipeline:

from sklearn.pipeline import make_pipeline

pipeline = make_pipeline(
    DictVectorizer(),
    LinearRegression()
)

pipeline.fit(train_dicts, y_train)
y_pred = pipeline.predict(val_dicts)

mlflow.sklearn.log_model(pipeline, 'model')

Training pipeline

Convert the notebook to a script

pipenv run jupyter nbconvert --to=script duration-prediction.ipynb

Rename the file to train.py and clean it

Run it:

pipenv run python train.py

Model registry

Let's get this model

model = mlflow.pyfunc.load_model('models:/trip_duration/staging')

And use it:

y_pred = model.predict(val_dicts)

trip = {
    'PULocationID': '43',
    'DOLocationID': '238',
    'trip_distance': 1.16
}

model.predict(trip)

In some cases we don't want to depend on the MLFlow model registry to be always available. In this case, we can get the S3 path of the model and use it directly for initializing the model

wget https://raw.githubusercontent.com/alexeygrigorev/lightweight-mlops-zoomcamp/main/train/storage_uri.py

MODEL_METADATA=$(pipenv run python storage_uri.py \
    --tracking-uri http://localhost:5000 \
    --model-name trip_duration \
    --stage-name staging)
echo ${MODEL_METADATA}

Now we can use the storage URL to load the model:

model = mlflow.pyfunc.load_model(storage_url)
y_pred = model.predict(val_dicts)

Serving

Poll: "What can we use for serving an ML model?"

Now let's go to the serve folder and create a virtual environment

pipenv --python=3.11
pipenv install \
    scikit-learn==1.3.1 \
    mlflow==2.7.1 \
    boto3 \
    flask \
    gunicorn

Create a simple flask app (see serve.py)

wget https://raw.githubusercontent.com/alexeygrigorev/lightweight-mlops-zoomcamp/main/serve/serve.py

Run it:

echo ${MODEL_METADATA} | jq

export MODEL_VERSION=$(echo ${MODEL_METADATA} | jq -r ".run_id")
export MODEL_URI=$(echo ${MODEL_METADATA} | jq -r ".source")

pipenv run python serve.py

Test it:

REQUEST='{
    "PULocationID": 100,
    "DOLocationID": 102,
    "trip_distance": 30
}'
URL="http://localhost:9696/predict"

curl -X POST \
    -d "${REQUEST}" \
    -H "Content-Type: application/json" \
    ${URL}

Now package the model with Docker and deploy it (outside of the scope for this tutorial).

Monitoring

Now let's add logging to our model. For that, we will save all the predictions somewhere. Later we will load these preditions and see if the features are drifting.

First, we need to have a way to correlate the request and the response. For that, each request needs to have an ID.

We can change the request to look like that:

{
    "ride_id": "ride_xyz",
    "ride": {
        "PULocationID": 100,
        "DOLocationID": 102,
        "trip_distance": 30
    }
}

Let's change the serve.py file to handle this and also return the ID along with the predictions (see serve_v2.py)

wget https://raw.githubusercontent.com/alexeygrigorev/lightweight-mlops-zoomcamp/main/serve/serve_v2.py

New request:

REQUEST='{
    "ride_id": "ride_xyz",
    "ride": {
        "PULocationID": 100,
        "DOLocationID": 102,
        "trip_distance": 30
    }
}'
URL="http://localhost:9696/predict"

curl -X POST \
    -d "${REQUEST}" \
    -H "Content-Type: application/json" \
    ${URL}

Now let's log the predictions - see the log function in serve_v2.py

Here we will use filesystem, but in practice you should never do it and use tools like logstash, kafka, kinesis, mongo and so on.

A good approach could be writing results to Kinesis and then dumping using Kinesis Firehose to save the results to S3 (see an example here)

Now we start sending the requests and collect enough of them for a couple of days.

Let's analyze them (see the notebook in the monitor notebook in the train folder).

For that let's pretend we saved all the predictions, but in reality we'll just run them in our notebook

Let's use these datasets:

After that, we can extract this notebook to a script, and if we detect an issue, send an alert, re-train the model or do some other actions.

What's next

Use Prefect/Airflow/etc for orchestration
Use BentoML/KServe/Seldon for deployment
Use Evidently/whylogs/Seldon for monitoring

If you want to learn how to do it - check our MLOps Zoomcamp course

alexeygrigorev / lightweight-mlops-zoomcamp