This project demonstrates how to apply MLOps principles in Azure. MLOps stands for machine learning operations and it is the art of applying DevOps principles in machine learning such as deploying a model, consuming endpoints and automating a pipeline.
In this project, we work with a bank marketing dataset to accurately predict if a potential client will subscribe to the bank's term deposit. A model is created using AutoML, and the model is then deployed and consumed via a REST endpoint.
(Workflow Taken from Udacity)
There was no need to authenticate as i used the environment provided by Udacity where authentication was already implemented.
As mentioned earlier, we used the bank marketing dataset in this project.
We create an AutoML run on this dataset to solve a classification problem while explaining the best model.
The best model was a Voting Ensemble model with an accuracy of 92%
The voting ensemble model was then deployed and "Authentication" was enabled while deploying the model using Azure Container Instance (ACI)
After deploying the best model, logs.py was executed to enable Application Insights and retrieve the logs
Swagger is a tool that displays the contents of an API in easy to read manner. A swagger.json file was downloaded from Azure, and serve.py and swagger.sh were executed to run Swagger on the localhost. The images below shows Swagger running on the localhost and the responses of the model
To interact with the deployed model, we copy the REST endpoint and the primary key of the model into the endpoint.py script and then run the script. The screenshot below shows the responses retrieved from the model
To benchmark our model, we retrieve a data.json that is created from the endpoint.py script. This json file is then used by benchmark.sh to retrieve performance results
To create and publish a pipeline we run all the cells in the Jupyter notebook. Below are screenshots that show results of running the Jupyter notebook.
The first step is to create a pipeline:
The pipeline endpoint:
Dataset Used:
We then publish the pipeline. We used rundetails to monitor the progress of the pipeline
In the published pipeline overview we can see the status (ACTIVE) and a REST endpoint
- Various scripts were run in the terminal. This could be made easier by running them in the Jupyter notebook or combining all the scripts in a single script
- Use ParalellRunStep to create a Pipeline step to process large amount of data asynchronously and in parallel