prvnmali2017 / mlops-e2e

MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK

This sample project uses a sample machine learning project to showcase how we can implement MLOps - CI/CD for Machine Learning using Amazon SageMaker, AWS CodePipeline and AWS CDK

Pre-requisite

Configuration

Source Repo

Option 1: Use GitHub Repo

  1. Fork this repo in your GitHub account
  2. Create a GitHub connection using the CodePipeline console to provide CodePipeline with access to your Github repositories (See session Create a connection to GitHub (CLI))
  3. Update the GitHub related configuration in the ./configuration/projectConfig.json file
    • Set the value of repoType to git
    • Update the value of githubConnectionArn, githubRepoOwner and githubRepoName

Option 2: Create a CodeCommit Repo in your AWS account

Alternatively, the CDK Infrastructure code can provision a CodeCommit Repo as Source Repo for you.

To switch to this option, set the value of repoType to codecommit in the ./configuration/projectConfig.json file.

Usage

Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.

Bootstrap

Run the command below to provision all the required infrastructure.

bootstrap.sh

The command can be run repatedly to deploy any changes in this folder.

Source Code

If repoType is codecommit, after the cloudformation stack is created, follow this page to connect to the CodeCommit Repo and push the content of this folder to the main branch of the repo.

Note: The default branch may not be main depending on your Git setting.

Testing Data Set

Download a copy of testing data set from https://archive.ics.uci.edu/ml/datasets/abalone, and upload it to the Data Source S3 Bucket (The bucket name starts with mlopsinfrastracturestack-datasourcedatabucket...) under your prefered folder path, e.g. yyyy/mm/dd/abalone.csv.

Cleanup

To clean up all the infrastructure, run the command below:

cleanup.sh

Sample Machine Learning Project

The project is created based on the SageMaker Project Template - MLOps template for model building, training and deployment.

In this example, we are solving the abalone age prediction problem using a sample dataset. The dataset used is the UCI Machine Learning Abalone Dataset. The aim for this task is to determine the age of an abalone (a kind of shellfish) from its physical measurements. At the core, it's a regression problem.

Project Layout

  • buildspecs: Build specification files used by CodeBuild projects
  • configuration: Project and Pipeline configuration
  • docs: Images used in the documentation
  • infrastructure: AWS CDK app for provisioning the end-to-end MLOps infrastructure
  • ml_pipeline: The SageMaker pipeline definition expressing the ML steps involved in generating an ML model and helper scripts
  • model_deploy: AWS CDK app for deploying the model on SageMaker endpoint
  • scripts: Bash scripts used in the CI/CD pipeline
  • src: Machine learning code for peprocessing and evaluating the ML model
  • tests: Unit testing code for testing machine learning code

Overall Architecture

The overall archiecture of the sample project is shown below:

Overall Archiecture

License

This project is licensed under the MIT.

Contributing

Refer to CONTRIBUTING for more details on how to contribute to this project.

About

MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK

License:MIT No Attribution


Languages

Language:Python 46.5%Language:TypeScript 42.3%Language:JavaScript 7.0%Language:Shell 4.2%