The objective of this project is to use historical loan application data to predict whether or not an applicant will be able to repay a loan.
Exploratory Data Analysis
- The distribution of data in the training set is imbalanced.
- Lets understand the number of empty cells in the dataframe i.e. the missing data.
- The different types of features in the dataframe
Feature Type | Number of featues |
float64 | 65 |
int64 | 41 |
object | 16 |
- We need to handle the 16 categorical variables in the dataset. Lets check the number of uniques values in each of these 16 categorical featues.
Features | Distinct Values |
NAME_CONTRACT_TYPE | 2 |
CODE_GENDER | 3 |
FLAG_OWN_CAR | 2 |
FLAG_OWN_REALTY | 2 |
NAME_TYPE_SUITE | 7 |
NAME_INCOME_TYPE | 8 |
NAME_EDUCATION_TYPE | 5 |
NAME_FAMILY_STATUS | 6 |
NAME_HOUSING_TYPE | 6 |
OCCUPATION_TYPE | 18 |
WEEKDAY_APPR_PROCESS_START | 7 |
ORGANIZATION_TYPE | 58 |
FONDKAPREMONT_MODE | 4 |
HOUSETYPE_MODE | 3 |
WALLSMATERIAL_MODE | 7 |
EMERGENCYSTATE_MODE | 2 |