AnshuTrivedi / Project-2-Operationalizing-Machine-Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Operationalizing Machine Learning

Project Overview

Project majorly based on use of AutoML studio for model train and deploying as web service on swagger and same tasks using Python SDK by creating ML Pipeline.

The first part consists of creating a machine learning production model using AutoML in Azure Machine Learning Studio, and then deploy the best model and consume it with the help of Swagger UI using the REST API endpoint and the key produced for the deployed model.

The second part of the project is following the same steps but this time using Azure Python SDK to create, train, and publish a pipeline. For this part, I am using the Jupyter Notebook provided. The whole procedure is explained in this README file and the result is demonstrated in the screencast video. For both parts of the project. The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict whether the client will subscribe a bank term deposit. The result of the prediction appears in column y and it is either yes or no.

Architectural Diagram

Project_overview and architectural diag

Key Steps

The key steps of the project are described below:

1. Authentication:

   This step omitted since it could not be implemented in the lab space provided by Udacity,
   because I am not authorized to create a security principal. However, I am still mentioning it here as
   it is a crucial step if one uses their own Azure account.

2. Automated ML Experiment:

   At this point, security is enabled and authentication is completed. This step involves the creation of
   an experiment using Automated ML, configuring a compute cluster, and using that cluster to run the
   experiment.

Registered Dataset

Dataset

Registered dataset detailed view

data details

Auto ML experiment configurations

automl config

Automl Experiment completed

auto ml exp

Best model

best model

3. Deploy the Best Model:

   After the completion of the experiment run, a summary of all the models and their metrics are shown, including
   explanations.The Best Model will appear in the Details tab, while it will appear first in the Models tab. This
   is the model that should be selected for deployment. Its deployment allows to interact with the HTTP API service 
   and interact with the model by sending data over POST requests.

Best model metrics

best model

4. Enable Logging:

  After the deployment of the Best Model, I enabled Application Insights and retrieve logs.

Details tab of Endpoint showing application insights enabled

insights enabled

Running logs script

running logs

5. Swagger Documentation:

    This is the step where the deployed model will be consumed using Swagger. Azure provides a Swagger JSON file for deployed models.
    We can find the deployed model in the Endpoints section, where it should be the first one on the list.

Running swagger script

swagger run

Swagger response and methods

6. Consume Model Endpoints:

   Once the model is deployed, I am using the endpoint.py script to interact with the trained model. I run the script
   with the scoring_uri that was generated after deployment and -since I enabled Authentication- the key of the service. 
   This URI is found in the Details tab, above the Swagger URI.

Endpoint script run

end point script

7. Create and Publish a Pipeline:

   In this part of the project, I am using the Jupyter Notebook with the same keys, URI, dataset, cluster, and model names already 
   created.

Pipeline created

pipeline created

Pipeline Endpoint

pipeline endpoint

Published pipeline overview

pipeline overview

Jupyter Notebook showing Use RunDetails Widget

widget run

In ML Studio scheduled run

ML scheduled run

8. Documentation:

  The documentation includes:
  
  1. the screencast that shows the entire process of the working ML application.
  2. this README file that describes the project and documents the main steps.

Screen Recording

Screen recording

Standout Suggestions

  1. I explored bank marketing dataset to understand better features and granularity of data, found high class imbalance between two classes which can impact model performance.
  2. Use of deep learning in automated ml model training: explored option of using deep learning for model traing but in community discusson I found not to use that option.

References

  1. Dealing with imbalanced data in Auto ML
  2. How to consume web services
  3. Swagger User interface documentation
  4. How to deploy model on Azure

About


Languages

Language:Jupyter Notebook 89.3%Language:Python 6.0%Language:Shell 4.7%