vannguyen3007 / Home-Credit-Default-Risk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Home Credit Default Risk based on LightGBM model

The Home credit default risk machine learning competition's main objective was to predict whether or not the applicant will be able to repay the loan which is a standard supervised classification task:

Supervised: The labels are included in the training data and the goal is to train a model to learn to predict the labels from the features

Classification: The label is a binary variable, 0 (will repay loan on time), 1 (will have difficulty repaying loan)

The data comprised of 7 different files:

application_train/application_test: the main training and testing data with information about each loan application at Home Credit. Every loan has its own row and is identified by the feature SK_ID_CURR. The training application data comes with the TARGET indicating 0: the loan was repaid or 1: the loan was not repaid.

bureau: data concerning client's previous credits from other financial institutions. Each previous credit has its own row in bureau, but one loan in the application data can have multiple previous credits.

bureau_balance: monthly data about the previous credits in bureau. Each row is one month of a previous credit, and a single previous credit can have multiple rows, one for each month of the credit length.

previous_application: previous applications for loans at Home Credit of clients who have loans in the application data. Each current loan in the application data can have multiple previous loans. Each previous application has one row and is identified by the feature SK_ID_PREV.

POS_CASH_BALANCE: monthly data about previous point of sale or cash loans clients have had with Home Credit. Each row is one month of a previous point of sale or cash loan, and a single previous loan can have many rows.

credit_card_balance: monthly data about previous credit cards clients have had with Home Credit. Each row is one month of a credit card balance, and a single credit card can have many rows.

installments_payment: payment history for previous loans at Home Credit. There is one row for every made payment and one row for every missed payment.

The diagram shows how the data is related:

home_credit_overview

My Solution's Key Components

About


Languages

Language:Jupyter Notebook 100.0%