DaniGio / ML_Supervised

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Supervised Machine Learning

Project overview

The project aims to support Fast Lending better predict credit risk, it was built different models using machine learning to predict those risks.

Summary

To lead to a more accurate identification of good candidates to loans which lead to lower default rates it was built and evaluated several machine learning models to predict credit risk. Using an imbalanced dataset, it was able to resample the data and build and evaluate logistic regression classifiers using the resampled data.

When naive random oversampling the data, the results were:

  • alt text

When SMOTE oversampling the data, the results were:

  • alt text

When undersampling the data, the results were:

  • alt text

When using a combination approach with the SMOTEENN algorithm, the results were:

  • alt text

And two different ensemble classifiers to predict loan risk and evaluate each model:

BalancedRandomForestClassifier :

  • alt text

EasyEnsembleClassifier

  • alt text

After analizing the different models it is possible to see that the precision, that supports to understand how reliable a positive classification is, shows only high value to "low risk", meaning that we have good chances to identify the "low risk" correctly. But the precision for "high risk" is very low. Continuing the analysis, it is possibel to identify the recall value, also known as sensitivity. This measurement allows us to understand the value of the corrects predictions. From the models above, only EasyEnsembleClassifier shows a high value for "High Risk". The accuracy score between predicted values and actual values are also low for all models. To conclude, none of the models were very accurated. It is possible to identify the "low risk" status, but for this exercise it is recommended to have better values for "High Risk"- since those are the ones that we really should be flagging.

About


Languages

Language:Jupyter Notebook 100.0%