SageMaker End to End lab for Building, Training, and Deploying ML models

In this repository, we are stepping through an end to end implementation of Machine Learning (ML) models using Amazon SageMaker, by deploying versioned models stored in the Amazon SageMaker Model Registry for real-time inference using Amazon SageMaker Hosting Services.

This is a sample code repository for demonstrating how to organize your code for build and train your model, by starting from an implementation through notebooks (Prototyping phase) for arriving to a code structure architecture for implementing ML pipelines using Amazon SageMaker Pipeline, and how to setup a repository for deploying ML models using CI/CD.

This repository is enriched by CloudFormation templates for setting up the ML environment, by creating the SageMaker Studio environment, Networking, and CI/CD for deploying ML models.

Everything can be tested by using the example notebooks for running training on SageMaker using the following frameworks:

Machine Learning Task

In this example, we are performing a Sentiment Analysis task by creating a multiclass classification model using Huggingface - Amazon Bort and training on a dataset of Negative, Neutral, Positive tweets.

Negative - 0
Neutral - 1
Positive - 2

Content overview

In this repository, you will cover an end-to-end approach for building, training, deploying, and monitoring a ML model for fraud detection by using Amazon SageMaker.

Data Visualization using Amazon SageMaker Studio Notebooks
Data Preparation for training using Amazon SageMaker Processing, by exploring the different capabilities that Amazon SageMaker is providing us:
1. Framework Processor Use the Amazon SageMaker framework containers by providing your own custom python scripts and python dependencies using a requirements.txt
2. Custom Scripts Container Build your own custom container that has your python dependencies installed and provide your own custom python scripts
3. BYOC Build your own custom container that has your own code and dependencies
Store dataset features by using Amazon SageMaker Feature Store
Train, and version ML models using Amazon SageMaker Training, by exploring the different capabilities that Amazon SageMaker is providing us, and Amazon SageMaker Model Registry:
1. Framework Container Use the Amazon SageMaker framework containers by providing your own custom python scripts and python dependencies using a requirements.txt
2. Custom Scripts Container Adapt your own custom container that has your python dependencies installed by using the Amazon SageMaker Training Toolkit and provide your own custom python scripts
3. BYOC by building your own custom container that has your own code and dependencies
Hyperparameters optimization by using Amazon SageMaker Hyperparameter Tuning
ML workflow Pipeline creation, by using Processing, Training, and Versioning steps, by using Amazon SageMaker Pipeline
Real-Time endpoint deployment using Amazon SageMaker Hosting Services, by exploring the different capabilities that Amazon SageMaker is providing us:
1. Framework Predictor Provide your own inference script and deploy an ML model, taken from the Amazon SageMaker Model Registry, by using the Amazon SageMaker framework container
2. BYOC Build your own custom container that has your own code and dependencies, and expose the necessary endopoints for Amazon SageMaker Real-Time Endpoint
Feature attributions computation for model explainability by using Amazon SageMaker Clarify
Monitor model quality by using Amazon SageMaker Model Monitoring with the possibility to execute Model Quality jobs manually for testing batch executions

1. Machine Learning End to End with Amazon SageMaker

As a first step, as a ML Expert I would like to experiment through notebooks the different capabilities offered by Amazon SageMaker.

For the purpose of this section, you will see how to manage Notebooks and call APIs for interacting with the different Amazon SageMaker services.

Environment Setup

Setup the ML environment by deploying the CloudFormation templates described as below:

00-networking: This template is creating a networking resources,
such as VPC, Private Subnets, Security Groups, for a secure environment for Amazon SageMaker Studio. The necessary variables used by SageMaker Studio are stored using AWS Systems Manager
01-sagemaker-studio: This template is creating the SageMaker Studio environment, with the necessary execution role used during the experimentation and the execution of the SageMaker Jobs. Parameters:
1. SageMakerDomainName: Name to assign to the Amazon SageMaker Studio Domain. Mandatory
2. SecurityGroupId: Provide a Security Group for studio if you want to use your own networking setup, otherwise the parameter is read by using AWS SSM after the deployment of the template 00-networking. Optional
3. SubnetId: Provide a Subnet (Public or Private) for studio if you want to use your own networking setup, otherwise the parameter is read by using AWS SSM after the deployment of the template 00-networking. Optional
4. VpcId: Provide a Vpc ID for studio if you want to use your own networking setup, otherwise the parameter is read by using AWS SSM after the deployment of the template 00-networking. Optional
02-ml-environment: This template is creating the necessary resources for the ML environment, such as Amazon S3 bucket for storing code and model artifacts, and Amazon SageMaker Model Registry for versioning trained ML models. Parameters:
1. KMSAlias: Alias to use for the KMS Key for encrypting data. Optional
2. ModelPackageGroupName: Name of the Amazon SageMaker Model Package Group where ML models will be stored. Mandatory
3. S3BucketName: Name for the S3 bucket where ML model artifacts, code artifacts, and data will be stored. Mandatory

Labs

1. Prepare data, train ML models using Amazon SageMaker

Explore the directory 00-ml-build-train

The code structure defined for the Build and Train ML models is the following:

algorithms: The code used by the ML pipelines for processing and training ML models is stored in this folder
- algorithms/processing: This folder contains the python code for performing processing of data using Amazon SageMaker Processing Jobs
- algorithms/training: This folder contains the python code for training a custom ML model using Amazon SageMaker Training Jobs
mlpipelines: This folder contains some utilities scripts created in the official AWS example Amazon SageMaker secure MLOps and it contains the definition for the Amazon SageMaker Pipeline used for training
- mlpipelines: This folder contains the python code for the ML pipelines used for training
notebooks: This folder contains the lab notebooks to use for this workshop:
- notebooks/00-Data-Visualization: Explore the input data and test the processing scripts in the notebook
- notebooks/01-Prepare-Data-ML-Framework-Container: Define a Python Script and create jobs for data processing using Amazon SageMaker Processing
- notebooks/02-Prepare-Data-ML-Custom-Script-Container: Create jobs for data processing using Amazon SageMaker Processing with a Custom Container by providing a Custom Script
- notebooks/03-Prepare-Data-ML-Custom-Container: Create jobs for data processing using Amazon SageMaker Processing with by using a Custom Container (BYOC)
- notebooks/04-Store-Features: Store features from prepared data using Amazon SageMaker Feature Store
- notebooks/05-Training-Build-Model-Framework-Container: Train a ML model using SageMaker Training, Register the trained model version by using Amazon SageMaker Model Registry.
- notebooks/06-Train-Build-Model-Custom-Script-Containerl: Train a ML model using SageMaker Training with a Custom Container by providing Custom Scripts, and Register the trained model version by using Amazon SageMaker Model Registry.
- notebooks/07-Train-Build-Model-Custom-Container: Train a ML model using SageMaker Training by using a Custom Container (BYOC), and Register the trained model version by using Amazon SageMaker Model Registry.
- notebooks/08-Hyperparameter-Optimization: Identify the best configuration set of hyperparameters for your ML algorithm by using Amazon SageMaker Hyperparameter Optimization
- notebooks/09-SageMaker-Pipeline-Training: Define the workflow steps and test the entire end to end using Amazon SageMaker Pipeline

2. Evaluate, Deploy ML models with Real-Time endpoint, and Monitor ML model quality

Explore the directory 01-ml-deploy

The code structure defined for the Deploy ML models is the following:

algorithms: The code used by Amazon SageMaker Endpoint for performing inference is stored in this folder
- algorithms/inference: This folder contains the python code for performing inference using Amazon SageMaker Endpoints.
- algorithms/inference-custom-container: This folder contains the python code for performing inference using Amazon SageMaker Endpoints. It contains the code structure for creating an application server, the Dockerfile for creating an Image to be used with Amazon SageMaker Hosting.
mlpipelines: This folder contains some utilities scripts created in the official AWS example Amazon SageMaker secure MLOps and it contains the scripts to run through CI/CD for creating or updating Amazon SageMaker Endpoints
notebooks: This folder contains the lab notebooks to use for this workshop:
- notebooks/00-SageMaker-Endpoint: This notebook shows you how to deploy a trained ML model taken from the Amazon SageMaker Model Registry
- notebooks/01-SageMaker-Endpoint-Custom-Container: This notebook shows you how to deploy a trained ML model taken from the Amazon SageMaker Model Registry by using a Custom Docker Image for a Real-Time inference (BYOC)
- notebooks/02-Model-Monitor: Analyze feature importance in your deployed model by using Amazon SageMaker Clarify
- notebooks/03-Model-Monitor: This notebook shows you how to create Amazon SageMaker Model Monitor jobs for monitoring model quality on the deployed endpoint
- notebooks/04-Pipeline-Deployment: Define the workflow steps and test the entire end to end using the script for CI/CD deployment

2. Automate model deployments with AWS MLOps

Once the prototyping phase is finished, as a ML Engineer I would like to automate all the steps executed through notebooks in the previous section.

For this reason, it's really important to define proper Git Repositories where my code will be versioned, define a proper repository and code structure, and automate the ML and CI/CD steps through the usage of pipelines.

For the purpose of this example, we have defined the following repository structure:

* repo
  * algorithms: Where ML Experts have to store the code for data processing, training, and inference
  * mlpipelines: Where ML Experts and MLOps Engineers have to define the ML steps in a proper workflow
  * notebooks: Where ML Experts have to store experimentation notebooks
  * buildspec.yml: Where DevOps Engineers have to define the CI/CD steps for building code packages, execute ML Pipelines, and deploy packages into instances

For this Lab section, we will define two different code repositories for keeping separate the different ML phases:

ml-build-train: repository used for the ML phases that led to the creation of a ML model
ml-deploy: repository used for the ML phases that led to the deployment of a ML model

Environment Setup

Setup the ML environment by deploying the CloudFormation templates described as below:

03-ci-cd: This template is creating the CI/CD pipelines using AWS CodeBuild and AWS CodePipeline. It creates two CI/CD pipelines, linked to two AWS CodeCommit repositories, one for training and one for deployment, that can be triggered through pushes on the main branch or with the automation part deployed by using Amazon EventBridge Rule, for monitoring updates in the SageMaker Model Registry and start the CI/CD pipeline for deploying ML models in the production environments. Parameters: 2. PipelineSuffix: Suffix to use for creating the CI/CD pipelines. Optional 3. RepositoryTrainingName: Name for the repository where the build and train code will be stored. Optional 4. RepositoryDeploymentName: Name for the repository where the deployment code will be stored. Optional 5. S3BucketArtifacts: Name of the Amazon S3 Bucket that will be created in the next stack used for storing code and model artifacts. Mandatory

The CloudFormation templates provided are creating a fully worked ML environment with CI/CD pipelines for automating the training of ML models and the deployment of real-time endpoints.

By starting from the AWS CodeCommit repositories created, you can customize the execution of CI/CD pipelines by editing the configurations for the repositories ml-build-train and ml-deploy.

Commits and Pushes in the two AWS CodeCommit repositories will automate the execution of the AWS CodePipeline for executing the training using the Amazon SageMaker Pipeline and the deployment of a real-time endpoint.

Labs

1. Continuous training with AWS CodePipeline and Amazon SageMaker Pipeline

The previously deployed CloudFormation template created two AWS CodeCommit repositories:

By starting from the AWS CodeCommit repositories created, you can customize the execution of CI/CD pipelines by editing the configurations for the repositories ml-build-train and ml-deploy.

Clone the repo ml-build-train
Copy the content of 00-ml-build-train inside the created folder
Edit the YAML configuration file.

Example:

training:
    pipeline_name: MLOpsTrainPipeline
    region: eu-west-1
    role: # Execution role name for SageMaker
    kms_account_id: # AWS Account ID where the KMS key was created
    kms_alias: ml-kms
    bucket_name: # Amazon S3 Bucket where ML models will be stored
    inference_instance_type: ml.m5.xlarge
    model_package_group_name: ml-end-to-end-group
    processing_artifact_path: artifact/processing
    processing_artifact_name: sourcedir.tar.gz
    processing_framework_version: 0.23-1
    processing_instance_count: 1
    processing_instance_type: ml.t3.large
    processing_input_files_path: data/input
    processing_output_files_path: data/output
    training_artifact_path: artifact/training
    training_artifact_name: sourcedir.tar.gz
    training_output_files_path: models
    training_framework_version: 2.5
    training_python_version: py37
    training_instance_count: 1
    training_instance_type: ml.g4dn.xlarge
    training_hyperparameters:
        epochs: 6
        learning_rate: 1.45e-4
        batch_size: 100

Commit and push changes in the repository

2. Continuous deployment by using AWS CodePipeline, Amazon EventBridge, and Amazon SageMaker Model Registry

The previously deployed CloudFormation template created two AWS CodeCommit repositories:

By starting from the AWS CodeCommit repositories created, you can customize the execution of CI/CD pipelines by editing the configurations for the repositories ml-build-train and ml-deploy.

Clone the repo ml-deploy
Copy the content of 01-ml-deploy inside the created folder
Edit the YAML configuration file.

Example:

deployment:
    pipeline_name: MLOpsDeploymentPipeline
    region: eu-west-1
    kms_account_id: # AWS Account ID where the KMS key was created
    kms_alias: ml-kms
    env: dev
    role:
    bucket_artifacts: # Amazon S3 Bucket where ML model artifacts are stored
    bucket_inference: # Amazon S3 Bucket for ML inference
    inference_artifact_path: artifact/inference
    inference_artifact_name: sourcedir.tar.gz
    inference_instance_count: 1
    inference_instance_type: ml.m5.xlarge
    model_package_group: ml-end-to-end-group
    monitoring_output_path: data/monitoring/captured
    training_framework_version: 2.5

For automating the deployment of a ML model, an Amazon EventBridge Rule for monitoring updates in the Amazon SageMaker Model Package group is created, by targeting the AWS CodePipeline for deployment.

brunopistone / sm-end-to-end-mlops

SageMaker End to End lab for Building, Training, and Deploying ML models

Machine Learning Task

Content overview

1. Machine Learning End to End with Amazon SageMaker

Environment Setup

Labs

1. Prepare data, train ML models using Amazon SageMaker

2. Evaluate, Deploy ML models with Real-Time endpoint, and Monitor ML model quality

2. Automate model deployments with AWS MLOps

Environment Setup

Labs

1. Continuous training with AWS CodePipeline and Amazon SageMaker Pipeline

2. Continuous deployment by using AWS CodePipeline, Amazon EventBridge, and Amazon SageMaker Model Registry

About

Languages