Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity.
This is a project to predict customer churn using credit card data from Kaggle. The data card describes the problem that this data set can help tackling as such:
A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really appreciate if one could predict for them who is gonna get churned so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.
The focus of this project is to display clean code principles: writing code that is clean, modular, documented, follows style guidelines, and is tested.
The main folders and files of the project are:
.
│ .gitignore
│ .pylintrc
│ churn_library.py
│ churn_script_logging_and_tests.py
│ conftest.py
│ README.md
│ requirements_py310.txt
├───data
│ bank_data.csv
│
├───images
│ │ sequencediagram.jpeg
│ │
│ ├───eda
│ │ churn_distribution.png
│ │ customer_age_distribution.png
│ │ heatmap.png
│ │ marital_status_distribution.png
│ │ total_trans_ct_distribution.png
│ │
│ └───results
│ LogisticRegression_results.png
│ LogisticRegression_shap_feature_importances.png
│ RandomForestClassifier_feature_importances.png
│ RandomForestClassifier_results.png
│ RandomForestClassifier_shap_feature_importances.png
│ ROC_LogisticRegression_RandomForestClassifier.png
│
├───logs
│ churn_library.log
│
├───models
│ LogisticRegression.pkl
│ RandomForestClassifier.pkl
│
├───notebooks
│ churn_notebook.ipynb
│ Guide.ipynb
│
└───src
constants.py
models.py
plotting.py
__init__.py
The most important folders are:
data
: contains the input data set with credit card dataimages
: contains output plots for EDA and model evaluationslogs
: contains logs displaying info, warning, and error events when running the library and test suitemodels
: contains trained models
The main modules to run are churn_library.py
and churn_script_logging_and_tests.py
.
The following diagram displays the main flow of churn_library.py
:
Clone this repository using the following command:
git clone https://github.com/EdwinWenink/customer_churn.git
NOTE: This project requires Python 3.10. You can create a new conda environment with Python 3.10 using:
conda create -n churn python=3.10
This project relies on the following packages:
scikit-learn
shap==0.41.0
joblib
pandas
numpy==1.23.0
matplotlib
seaborn
pylint
autopep8
You can install dependencies in your environment of choice using the requirements file for Python 3.10:
pip install -r requirements_py310.txt
NOTE: The shap
package is not maintained well and currently only compatible with numpy
<= 1.23.
After installing the required dependencies, you can run the churning pipeline by calling the main module:
python churn_library.py
Or:
ipython churn_library.py
By default, the outputs of the pipeline will be visible both in the command line and under logs\churn_library.log
.
If you specifically want to inspect test functions, you can invoke pylint
as follows:
pylint churn_script_logging_and_tests.py
This testing script can also be invoked directly using python
or ipython
.
The testing results will also be logged under logs\churn_library.log
.