balancedrandomforestclassifier credit-risk machine-learning randomoversampler smote-sampling

Credit_Risk_Analysis

Overview of the analysis:

This exercise is to employ different techniques to train and evaluate different machine learning models to predict credit risk with unbalanced classes. Algorithms used in the analysis:

the oversampling RandomOverSampler and SMOTE algorithms.
the undersampling ClusterCentroids algorithm to resample the data.
the combinatorial SMOTEENN algorithm to resample the training data.
BalancedRandomForestClassifier and EasyEnsembleClassifier to reduce bias.

Results:

We use balanced accuracy score, confusion matrix and imbalanced classification report to compare results.

RandomOverSampler

<

The balanced accuracy score is 62%.
The high_risk precision is about 1% only with 60% sensitivity which makes a F1 of 2% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 65%.

SMOTE

The balanced accuracy score is 65%.
The high_risk precision is about 1% only with 64% sensitivity which makes a F1 of 2% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 66%.
Very similiar result to the previous one.

ClusterCentroids

The balanced accuracy score is down to 52%.
The high_risk precision is about 1% only with 59% sensitivity which makes a F1 of 1% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 46%.

Combinatorial SMOTEENN

The balanced accuracy score is 62%.
The high_risk precision is about 1% only with 69% sensitivity which makes a F1 of 2%.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 54%.

BalancedRandomForestClassifier

The balanced accuracy score is greatly improved to 79%.
The high_risk precision is about 4% only with 67% sensitivity which makes a F1 of 7%.
Due to a lower number of false positives, its precision is almost 100% with a sensitivity of 91%.

EasyEnsembleClassifier

The balanced accuracy score is very high at 93%.
The high_risk precision is about 7% only with 91% sensitivity which makes a F1 of 14%.
Due to a lower number of false positives, its precision is almost 100% with a sensitivity of 94%.

Summary:

All the models we used to predict the credit risk analysis show weak precision in determining if a credit risk is high.
The Ensemble models show great improvment specially on the sensitivity of the high risk credits.
Even though the EasyEnsembleClassifier model detects almost all high risk credit. On another hand, with a low precision, a lot of low risk credits are still falsely detected as high risk. It may lead to the bank losing its business opportunities.
Maybe there are models the bank can use to predict credit risk other than those above.

About

Use different techniques to train and evaluate different machine learning models to predict credit risk with unbalanced classes

https://echoqshen.github.io/Credit_Risk_Analysis/

balancedrandomforestclassifier credit-risk machine-learning randomoversampler smote-sampling

Languages

Language:Jupyter Notebook 100.0%