Optimizing an ML Pipeline in Azure

Overview

This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.

Summary

This dataset contains data about bank marketing. We seek to predict if a client will subscribe to a term deposit. The best performing model was obtained through AutoML - VotingEnsemble with accuracy of 0.916

Scikit-learn Pipeline

Setup Training Script
- Import data
- Cleaning of data
- Splitting data into train/test
- Using scikit-learn logistic regression model for classification

Configuration of Hyperdrive
- Selection of parameter sampler
- Selection of primary metric
- Selection of early termination policy
- Selection of estimator (SKLearn)
- Allocation of resources

Save the trained optimized model

Parameter Sampler

The parameter sampler I chose was RandomParameterSampling because it supports both discrete and continuous hyperparameters.

Early Stopping Policy

The early stopping policy I chose was BanditPolicy because it is based on slack factor and evaluation interval..

AutoML

Import data
Cleaning of data
Splitting of data into train and test data
Configuration of AutoML
Save the best model generated

Pipeline comparison

Both approaches follow the same data processing steps,the difference is in their configuration details. In approach 1,we use hyperdrive tool to find optimal hyperparametets while in approach 2,different models are automatically generated with their own optimal hyperparameter values.

Pipeline for both approaches

Results for AutoML

Results for best model

Future work

work on this error WARNING:azureml.train.sklearn:'SKLearn' estimator is deprecate
feature engineering

## Proof of cluster clean up

About

Languages

Language:Jupyter Notebook 97.5%Language:Python 2.5%