tutecode / home-credit-default-risk

Binary classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Home Credit Default Risk

The Business problem

This is a binary Classification task: we want to predict whether the person applying for a home credit will be able to repay their debt or not. Our model will have to predict a 1 indicating the client will have payment difficulties: he/she will have late payment of more than X days on at least one of the first Y installments of the loan in our sample, 0 in all other cases.

We will use Area Under the ROC Curve as the evaluation metric, so our models will have to return the probabilities that a loan is not paid for each input data.

About the data

The original dataset is composed of multiple files with different information about loans taken. In this project, we will work exclusively with the primary files: application_train_aai.csv and application_test_aai.csv.

You don't have to worry about downloading the data, it will be automatically downloaded from the Project.ipynb notebook in Section 1 - Getting the data.

Technical aspects

To develop this Machine Learning model you will have to primary interact with the Jupyter notebook provided, called Project.ipynb. This notebook will guide you through all the steps you have to follow and the code you have to complete in the different parts of the project, also marked with a TODO comment.

The technologies involved are:

  • Python as the main programming language
  • Pandas for consuming data from CSVs files
  • Scikit-learn for building features and training ML models
  • Matplotlib and Seaborn for the visualizations
  • Jupyter notebooks to make the experimentation in an interactive way

Installation

A requirements.txt file is provided with all the needed Python libraries for running this project. For installing the dependencies just run:

$ pip install -r requirements.txt

Note: We encourage you to install those inside a virtual environment.

Code Style

Following a style guide keeps the code's aesthetics clean and improves readability, making contributions and code reviews easier. Automated Python code formatters make sure your codebase stays in a consistent style without any manual work on your end. If adhering to a specific style of coding is important to you, employing an automated to do that job is the obvious thing to do. This avoids bike-shedding on nitpicks during code reviews, saving you an enormous amount of time overall.

We use Black and isort for automated code formatting in this project, you can run it with:

$ isort --profile=black . && black --line-length 88 .

Wanna read more about Python code style and good practices? Please see:

Tests

We provide unit tests along with the project that you can run and check from your side the code meets the minimum requirements of correctness needed to approve. To run just execute:

$ pytest tests/

If you want to learn more about testing Python code, please read:

About

Binary classification


Languages

Language:Jupyter Notebook 97.8%Language:Python 2.2%