Jitesh117 / Credit_card_Score_Classifier

Credit Score Classifier leveraging diverse machine learning models including KNN, Random Forests, XGBoost, LightGBM, SVC, and ensemble techniques, achieving 79.24% accuracy with hyperparameter tuning for enhanced predictive performance

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Table of Contents

Introduction

  • Developed a Credit Score Classifier with 79.24% accuracy using various Machine Learning Models such as :
  • Employed various Classification models such as KNN, Random Forests, XGBoost, LightGBM, SVC, etc.
  • Did Hyperparameter tuning on the KNN model to achieve even better results
  • Employed Ensemble Modeling to improve the predictive performance of the individual models.

Data Cleaning and handling Missing values

Feature Count Unique Top Freq
Customer_ID 100,000 12,500 CUS_0xd40 8
Month 100,000 8 January 12,500
Age 100,000 1,788 38 2,833
Occupation 100,000 16 _______ 7,062
Annual_Income 100,000 18,940 36,585.12 16
Num_of_Loan 100,000 434 3 14,386
Type_of_Loan 88,592 6,260 Not Specified 1,408
Num_of_Delayed_Payment 92,998 749 19 5,327
Changed_Credit_Limit 100,000 4,384 _ 2,091
Credit_Mix 100,000 4 Standard 36,479
Outstanding_Debt 100,000 13,178 1,360.45 24
Credit_History_Age 90,970 404 15 Years and 11 Months 446
Payment_of_Min_Amount 100,000 3 Yes 52,326
Amount_invested_monthly 95,521 91,049 10,000 4,305
Payment_Behaviour 100,000 7 Low_spent_Small_value_payments 25,513
Monthly_Balance 98,800 98,792 -333333333333333333333333333 9
Credit_Score 100,000 3 Standard 5,317
  • As can be seen from the above table, there were too many inconsistencies and errors in the dataset which had to be cleaned
  • To tackle data cleaning, I created custom functions to efficiently clean both the numerical and categorical columns.

EDA

The insights gleaned from the pivot tables highlight key findings from this comprehensive analysis.

BoxPlot of Numeric Columns

Pair Plot

PairPlot of Payment_Behavior against other important features

Pair Plot

Outlier Detection and Removal

Before After
Before Image After Image

How I dealt with un-balanced Distribution

  • The data set showed unbalance distribution. This may cause a biased estimate.
  • So we will use SMOTE, an oversampling process that allows synthetic data to be generated.
sm = SMOTE(random_state=2)
smote_train_X, smote_train_Y = sm.fit_resample(X_train, y_train)

Model Selection

After training various models on the Dataset, I came to the following conclusion:

Model Success
KNN 78.105
RF 77.650
BC 75.010
XGB 72.745
LightGBM 69.750
SVC 69.435

Pair Plot

Results

Precision Recall F1-Score Support
0 0.77 0.86 0.81 5,874
1 0.85 0.75 0.80 10,599
2 0.69 0.82 0.75 3,527
Accuracy 0.79 20,000
Macro Avg 0.77 0.81 0.79 20,000
Weighted Avg 0.80 0.79 0.79 20,000

About

Credit Score Classifier leveraging diverse machine learning models including KNN, Random Forests, XGBoost, LightGBM, SVC, and ensemble techniques, achieving 79.24% accuracy with hyperparameter tuning for enhanced predictive performance


Languages

Language:Jupyter Notebook 100.0%