adhamalhossary / operationalizing-ml-on-azure

Project from my Machine Learning Engineer with Azure Nano-Degree program at Udacity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Operationalizing Machine Learning on Azure

Overview

This project demonstrates how to apply MLOps principles in Azure. MLOps stands for machine learning operations and it is the art of applying DevOps principles in machine learning such as deploying a model, consuming endpoints and automating a pipeline.

In this project, we work with a bank marketing dataset to accurately predict if a potential client will subscribe to the bank's term deposit. A model is created using AutoML, and the model is then deployed and consumed via a REST endpoint.

Architectural Diagram

Workflow

(Workflow Taken from Udacity)

Key Steps

Authentication

There was no need to authenticate as i used the environment provided by Udacity where authentication was already implemented.

AutoML model

Dataset Used

As mentioned earlier, we used the bank marketing dataset in this project.

dataset

Completed Experiment

We create an AutoML run on this dataset to solve a classification problem while explaining the best model.

completed_experiment

Best Model

The best model was a Voting Ensemble model with an accuracy of 92%

best_model

Deploy the Best Model

The voting ensemble model was then deployed and "Authentication" was enabled while deploying the model using Azure Container Instance (ACI)

deploy_model

Enable Logging

After deploying the best model, logs.py was executed to enable Application Insights and retrieve the logs

logs

Consume Model Endpoint

Swagger

Swagger is a tool that displays the contents of an API in easy to read manner. A swagger.json file was downloaded from Azure, and serve.py and swagger.sh were executed to run Swagger on the localhost. The images below shows Swagger running on the localhost and the responses of the model

swagger_1

swagger_2

swagger_3

Endpoint

To interact with the deployed model, we copy the REST endpoint and the primary key of the model into the endpoint.py script and then run the script. The screenshot below shows the responses retrieved from the model

endpoint

Benchmark

To benchmark our model, we retrieve a data.json that is created from the endpoint.py script. This json file is then used by benchmark.sh to retrieve performance results

benchmark

Create and Publish a Pipeline

To create and publish a pipeline we run all the cells in the Jupyter notebook. Below are screenshots that show results of running the Jupyter notebook.

Create Pipeline

The first step is to create a pipeline:

pipeline_run

The pipeline endpoint:

pipeline_endpoint

Dataset Used:

pipeline_dataset

Publish Pipeline

We then publish the pipeline. We used rundetails to monitor the progress of the pipeline

Pipeline

pipeline_active

Rundetails

pipeline_rundetails

In the published pipeline overview we can see the status (ACTIVE) and a REST endpoint

Active

pipeline_active

Screen Recording

https://youtu.be/JByL8p3lsnI

Future Improvements

  • Various scripts were run in the terminal. This could be made easier by running them in the Jupyter notebook or combining all the scripts in a single script
  • Use ParalellRunStep to create a Pipeline step to process large amount of data asynchronously and in parallel

About

Project from my Machine Learning Engineer with Azure Nano-Degree program at Udacity


Languages

Language:Jupyter Notebook 95.1%Language:Python 2.8%Language:Shell 2.2%