Predict Customer Churn

This contains the code for predicting customer churn for the customers of a bank. The initial prototype model is trained in a jupyter notebook named churn_notebook.ipynb. From this notebook the code has been refactored using best coding practices. Details of that are in the next section.

Project Description

In order for the ML code to be in a deployable state, the first requirement is to put the prototype code into a form which:

Is easily readable: This is achieved by following pep8 guidelines
Is optimized: This is done by changing sub-optimal code constructs into more optimal ones, eg using pandas or numpy operations rather than running for loops
Can be tested: This is done using unit tests and logging
Is modular: This is done by breaking code into functions and modules

Files and data description

The project has the following structure:

┣ data
 ┃ ┗ bank_data.csv
 ┣ images
 ┃ ┣ eda
 ┃ ┃ ┣ churn_hist.png
 ┃ ┃ ┣ cust_age_hist.png
 ┃ ┃ ┣ heatmap.png
 ┃ ┃ ┣ marital_status.png
 ┃ ┃ ┗ total_trans.png
 ┃ ┣ results
 ┃ ┃ ┣ feature_imp.png
 ┃ ┃ ┣ logistic regression_featimp.png
 ┃ ┃ ┣ random forest_featimp.png
 ┃ ┃ ┗ roc_plot.png
 ┣ logs
 ┃ ┗ churn_library.log
 ┣ models
 ┃ ┣ logistic_model.pkl
 ┃ ┗ rfc_model.pkl
 ┣ README.md
 ┣ churn_library.py
 ┣ churn_notebook.ipynb
 ┣ churn_script_logging_and_tests.py
 ┣ constants.py
 ┗ requirements_py3.8.txt

Below is the description of the folder structure:

data: contains the raw data file, on which the model is trained
images: contains the eda as well as model diagnostic plots
logs: contains the log messages generated while running unit tests
models: contains the trained model files
churn_library.py: is the driver program to read, perform eda, feature engineer and train models on the data.
constants.py: defines paths, hyperparameter and relevant feature names
churn_script_logging_and_tests.py: contains the unit tests and logging for testing the churn_library.py

Running Files

To run this project need to first install dependencies. Create a virtual environment with python3.8 using the requirements_py3.8.txt.

Then, run the churn_library.py as follows:

python churn_library.py

This will populate the following folders:

images: this will contain the eda plots in the eda subdirectory and model diagnostic plots in the results subdirectory
models: this will contain the trained models as pickle files

To test this project you need to run churn_script_logging_and_tests.py as follows:

python churn_script_logging_and_tests.py

This will populate the following folders:

images: this will contain the eda plots in the eda subdirectory and model diagnostic plots in the results subdirectory
models: this will contain the trained models as pickle files
logs: this will contain the log messages for the test cases run in the file

Gunnvant / good_ml_code

Predict Customer Churn

Project Description

Files and data description

Running Files

About

Languages