JeremyKRay / Credit_Risk_Analysis

The purpose of this analysis was to compare several different machine learning models to determine which is best at predicting credit risk. The dataset used came from LendingClub, a peer-to-peer lending services company. Credit risk is an inherently unbalanced classification problem. Good loans greatly outnumber risky loans. This is why we are using 6 different machine learning techniques to train and evaluate models with unbalanced classes. The techniques fall under 4 different categories and are imported from the imbalanced-learn library. Under oversampling, we employ RandomOverSampler and SMOTE algorithms. Under undersampling we employ the ClusterCentroids algorithm. We use a combination of the over and undersampling techniques by employing the SMOTEENN algorithm. And finally, under Ensemble learning techniques we employ BalancedRandomForestClassifier and EasyEnsembleClassifier algorithms. These last two are fairly new and reduce bias. The scikit-learn library is used to test and train the models and to compare all of the techniques, producing a balanced acuracy score and precision and recall scores to determine if they are suitable techniques at predicting credit risk.

Geek Repo

Github PK Tool

JeremyKRay/Credit_Risk_Analysis Stargazers

Jeremy Ray
JeremyKRay