This sample project uses a sample machine learning project to showcase how we can implement MLOps - CI/CD for Machine Learning using Amazon SageMaker, AWS CodePipeline and AWS CDK
- Python (version 3.8 or higher)
- NodeJS (version 14 or higher)
- Yarn (installed via
npm install -g yarn
) - Typescript (installed via
npm install -g typescript
) - AWS CDK CLI (installed via
npm install -g aws-cdk
) - AWS CLI (version 2 or higher)
- AWS CLI Configuration (configured via
aws configure
)
- Fork this repo in your GitHub account
- Create a GitHub connection using the CodePipeline console to provide CodePipeline with access to your Github repositories (See session Create a connection to GitHub (CLI))
- Update the GitHub related configuration in the
./configuration/projectConfig.json
file- Set the value of repoType to git
- Update the value of githubConnectionArn, githubRepoOwner and githubRepoName
Alternatively, the CDK Infrastructure code can provision a CodeCommit Repo as Source Repo for you.
To switch to this option, set the value of repoType to codecommit in the ./configuration/projectConfig.json
file.
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the AWS Pricing page for details. You are responsible for any AWS costs incurred. No warranty is implied in this example.
Run the command below to provision all the required infrastructure.
bootstrap.sh
The command can be run repatedly to deploy any changes in this folder.
If repoType is codecommit, after the cloudformation stack is created, follow this page to connect to the CodeCommit Repo and push the content of this folder to the main branch of the repo.
Note: The default branch may not be main depending on your Git setting.
Download a copy of testing data set from https://archive.ics.uci.edu/ml/datasets/abalone
, and upload it to the Data Source S3 Bucket (The bucket name starts with mlopsinfrastracturestack-datasourcedatabucket...) under your prefered folder path, e.g. yyyy/mm/dd/abalone.csv.
To clean up all the infrastructure, run the command below:
cleanup.sh
The project is created based on the SageMaker Project Template - MLOps template for model building, training and deployment.
In this example, we are solving the abalone age prediction problem using a sample dataset. The dataset used is the UCI Machine Learning Abalone Dataset. The aim for this task is to determine the age of an abalone (a kind of shellfish) from its physical measurements. At the core, it's a regression problem.
buildspecs
: Build specification files used by CodeBuild projectsconfiguration
: Project and Pipeline configurationdocs
: Images used in the documentationinfrastructure
: AWS CDK app for provisioning the end-to-end MLOps infrastructureml_pipeline
: The SageMaker pipeline definition expressing the ML steps involved in generating an ML model and helper scriptsmodel_deploy
: AWS CDK app for deploying the model on SageMaker endpointscripts
: Bash scripts used in the CI/CD pipelinesrc
: Machine learning code for peprocessing and evaluating the ML modeltests
: Unit testing code for testing machine learning code
The overall archiecture of the sample project is shown below:
This project is licensed under the MIT.
Refer to CONTRIBUTING for more details on how to contribute to this project.