eparamasari / Operationalizing_Machine_Learning_in_Azure

In this project, an AutoML run was performed on a bank marketing dataset from the Azure ML Studio, which aimed to predict whether contacted customers would subscribe to the bank product offered

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Operationalizing Machine Learning with Microsoft Azure Machine Learning

This is the second project of the Machine Learning Engineer Nanodegree with Microsoft Azure from Udacity.

Overview

In this project, an AutoML run was performed on a bank marketing dataset from the Azure ML Studio, which aimed to predict whether contacted customers would subscribe to the bank product offered.

The best model found was deployed from the Azure ML Studio and consumed by using its REST API endpoint using key as authentication.

A pipeline was then created and published in the same experiment by using the Azure ML Python SDK. The pipeline will make it easier to share the workflow and rerun the experiment in the future.

Architectural Diagram

Here is an architectural diagram of the project:

Architectural Diagram

Key Steps

1. Upload bank marketing dataset

In this step, the bank marketing dataset was uploaded.

    Bank Marketing Data

2. Run AutoML with auto featurization

In this step, an Automated ML run was performed as a classification task with auto featurization and AUC Weighted as the primary metric to be optimized

The best model was found to be Voting Ensemble, which is a very robust model as it actually takes into account the different 'votes' about the label that all the different models predict from the dataset.

    AutoML Run Completed

3. Deploy the best model

In this step, the best model from the AutoML run, Voting Ensemble, was deployed using Azure Container Instance.

    Best Model Deployed

    Deployed Model

4. Enable logging

In this step, logging was enabled with a Python script and the Application Insights page could be used to monitor the app.

    Logging Results

    Application Insight Enabled

    Logging Results and Application Insights

5. Check swagger documentation

In this step, swagger ui was used to see the input required for an API request in order to obtain predictions from the deployed model. Two modes of API requests were seen, i.e. GET and POST.

    Swagger UI

    Swagger: GET

    Swagger: POST

6. Consume model endpoint

After finding out the json structure of the input, in this step a Python script was run to get the prediction from the endpoint by sending the new data in the required input structure.

    Consume Endpoint

7. Create, publish, and consume a pipeline

After the model was deployed, a pipeline was created and publish to ease duplicating the project flow. Another run was scheduled and eventually re-run.

Completed pipeline run:

    Run Completed

    Published Pipeline

    Active Endpoint

The Bank marketing dataset with the AutoML module:

    Dataset and AutoML Module

Published pipeline overview:

    Published Pipeline Overview

    Pipeline Running

Screen Recording

Below is a link to a screen recording of the project in action.

https://youtu.be/53lSjGeCU0c

How to Improve

  • Accuracy can be used as the primary metric in the AutoML run to compare the results with the latest run which used AUC Weighted.

  • Data cleaning could be performed prior to running AutoML to increase accuracy.

  • Deep Learning capability in AutoML could be used and the results then be compared to that without Deep Learning.

  • A benchmark could be added to similar projects to serve as a monitoring baseline.

About

In this project, an AutoML run was performed on a bank marketing dataset from the Azure ML Studio, which aimed to predict whether contacted customers would subscribe to the bank product offered


Languages

Language:Jupyter Notebook 94.7%Language:Python 3.0%Language:Shell 2.3%