This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.
This dataset contains data about bank marketing. We seek to predict if a client will subscribe to a term deposit. The best performing model was obtained through AutoML - VotingEnsemble with accuracy of 0.916
- Setup Training Script
- Import data
- Cleaning of data
- Splitting data into train/test
- Using scikit-learn logistic regression model for classification
- Configuration of Hyperdrive
- Selection of parameter sampler
- Selection of primary metric
- Selection of early termination policy
- Selection of estimator (SKLearn)
- Allocation of resources
- Save the trained optimized model
Parameter Sampler
The parameter sampler I chose was RandomParameterSampling because it supports both discrete and continuous hyperparameters.
Early Stopping Policy
The early stopping policy I chose was BanditPolicy because it is based on slack factor and evaluation interval..
- Import data
- Cleaning of data
- Splitting of data into train and test data
- Configuration of AutoML
- Save the best model generated
Both approaches follow the same data processing steps,the difference is in their configuration details. In approach 1,we use hyperdrive tool to find optimal hyperparametets while in approach 2,different models are automatically generated with their own optimal hyperparameter values.
- work on this error WARNING:azureml.train.sklearn:'SKLearn' estimator is deprecate
- feature engineering