QPC-database / aws-mlops-pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Event-driven MLOps Pipeline on AWS

This repository contains example template that deploy the entire ML workflow including training, inference and model performance comparison.

This template is code in AWS Serverless Application Model framework. If you want to know more, please check Doc

Architecture Overview

Training Workflow

Train Arc Image

Inference Workflow

Inf Arc Image

Usage Guide

Pre-requisite

  • Clone or download this repo to local

  • Make sure you've install AWS CLI in local. If not, follow Doc

Installation

  • Visit the command in code repo root directory
    cd aws-mlops-pipeline 

 

  • The following commands will create:

    • Amazon S3 Bucket to temporarily store the code before uploading them to CodeCommit
    • Amazon ECR to store the image to be consumed by Amazon SageMaker
    • AWS CodePipeline named aws-mlops-pipeline with
      • AWS CodeCommit named as aws-mlops-pipeline
      • AWS CodeBuild named Build to build image
      • AWS CodeBuild named Deploy to deploy the AWS SAM ./template.yaml

     

    chmod +x setup.sh
    ./setup.sh

      CodePipeline Image

 

Sample Dataset

The original dataset we use is publicly available and was mentioned in the book Discovering Knowledge in Data by Daniel T. Larose. It is attributed by the author to the University of California Irvine Repository of Machine Learning Datasets.

State Account Length ... Churn?
RI 121 827 ... False.
MD 134 776 ... True.
... ... ... ... ...

In this solution, the original dataset is split by feature State and the state-based subsets are further separated into 3 portions (training, inference and holdout). Reference notebook

You could find the sample dataset in ./src/data with file structure:

.
├── ...
├── src
│   ├── data
│   │   ├── holdout
│   │   │   └── raw
│   │   │       ├── MD
│   │   │       │   └── MD.csv
│   │   │       └── RI
│   │   │           └── RI.csv
│   │   ├── inference
│   │   │   └── raw
│   │   │       ├── MD
│   │   │       │   └── MD.csv
│   │   │       └── RI
│   │   │           └── RI.csv
│   │   └── training
│   │       └── raw
│   │           ├── MD
│   │           │   └── MD.csv
│   │           └── RI
│   │               └── RI.csv
│   ├── ...
├── ...

Trigger ML Workflow

  • Visit Amazon S3 console and there will be two buckets:
    • Code Bucket: mlops-code-bucket-xxxxxxxx
    • Data Bucket: mlops-data-bucket-xxxxxxxx
  • Copy all the files in s3://<Code Bucket>/src/data/ to s3://<Data Bucket>/ Bucket Image
  • Visit AWS Step Function console and you should be able to notice 4 state machine executions (2 training and 2 inference) Step Function Image

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

License:MIT No Attribution


Languages

Language:Python 54.2%Language:Jupyter Notebook 43.1%Language:Shell 2.4%Language:Dockerfile 0.3%