billy-odera / nd00333_AZMLND_Optimizing_a_Pipeline_in_Azure-Starter_Files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimizing an ML Pipeline in Azure

Overview

This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.

Summary

This dataset contains data about bank marketing. We seek to predict if a client will subscribe to a term deposit. The best performing model was obtained through AutoML - VotingEnsemble with accuracy of 0.916

Scikit-learn Pipeline

  1. Setup Training Script
    • Import data
    • Cleaning of data
    • Splitting data into train/test
    • Using scikit-learn logistic regression model for classification

  2. Configuration of Hyperdrive
    • Selection of parameter sampler
    • Selection of primary metric
    • Selection of early termination policy
    • Selection of estimator (SKLearn)
    • Allocation of resources

  3. Save the trained optimized model

Parameter Sampler

The parameter sampler I chose was RandomParameterSampling because it supports both discrete and continuous hyperparameters.

Early Stopping Policy

The early stopping policy I chose was BanditPolicy because it is based on slack factor and evaluation interval..

AutoML

  1. Import data
  2. Cleaning of data
  3. Splitting of data into train and test data
  4. Configuration of AutoML
  5. Save the best model generated

Pipeline comparison

Both approaches follow the same data processing steps,the difference is in their configuration details. In approach 1,we use hyperdrive tool to find optimal hyperparametets while in approach 2,different models are automatically generated with their own optimal hyperparameter values.

Pipeline for both approaches

Results for AutoML

Results for best model

Future work

  1. work on this error WARNING:azureml.train.sklearn:'SKLearn' estimator is deprecate
  2. feature engineering
## Proof of cluster clean up

About


Languages

Language:Jupyter Notebook 97.5%Language:Python 2.5%