machine-learning python scikit-learn data-science pandas credit-risk

Home Credit Default Risk [Machine Learning Project]

This is a Python-based implementation of two different types of machine learning models [mentioned below] on the task of "Home Credit Default Risk".

Language and Libraries

About Dataset:

Home Credit Default Risk DataSet from Kaggle Competitions

Data Set Information

Table of Content:

Data Loading

!kaggle competitions download home-credit-default-risk

Exploratory Data Analysis

Checking Missing Values (Data contains lots of null values and need to be clean or replace using Imputation Techniques)
Checking Duplicate Data (The no. of duplicates in the data: 0)
Data Visualization

Feature Engineering

Feature Engineering Application Train Data

Data Prepration

Merging all 6 Datasets - Key = SK_ID_CURR

Data Preprocessing

Imputing Categorical & Numerical Data (SimpleImputer)
Scaling Numerical Data (StandardScaler)
Encoding Categorical Data (OneHotEncode)
Class Balancing (RandomOverSampling)

Feature Selection

Model Used - LGBMClassifier

Classification

Models Used:

LGBM Classifier About
RandomForest Classifier About

Model Evaluation

HyperParameter Tunning

Results

About

This is a Python-based implementation of at least two different types of machine learning models on the task of "Home Credit Default Risk".

machine-learning python scikit-learn data-science pandas credit-risk

Languages

Language:Jupyter Notebook 100.0%