YahyaKamel / Prediction-of-probability-of-default-using-Machine-Learing-in-R

Probability of default using Machine Learning in R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Probability of default in R

Yahya Kamel

12 January 2022

https://www.linkedin.com/in/yahya-kamel-5653b24b/

Harvard University

Data science capstone project

The objective of this project is to use Machine Learning “ML” prediction models to predict the Probability of Default “PD” of retail loans. This prediction process is very vital for companies and institutions dealing with receivables and loans. Through this project, we will get hands on different models and techniques to get more insight from the prediction models. This project explores the following ML prediction models:

  • Logistic regression with its different branches (log, logit, cloglog, probit)
  • K Nearest Neighbors non-linear regression
  • Vector Support Machines - Radial
  • Neural networks
  • Naive Bayes
  • Decision tree
  • Random forest This paper is divided in the following sections: 1 Data exploration 2 Data wrangling 3 Modeling 4 Final PDs and conclusion

The underlying dataset contains the following data about around 29 thousand of retail borrowers:

  • loan_status: 0 if borrower has defaulted and 1 if borrower has not defaulted.
  • loan_amt: Total amount of loan
  • int_rate: Interest rate applicable to the loan
  • grade: Credit risk rating. A is the lowest in risk and G is the highest in risk.
  • annual_inc: Annual income of the borrower.
  • emp_length: Number of years of work.
  • home_ownership: Rent/Mortage/Own/Others.
  • age: Borrower’s age.

The expected time to run the code is 25 – 45 minutes, depending on the machine speed.

Disclaimer: The information, estimates, code, and conclusions mentioned in this paper are only for educational purposes and should not be relied upon for any decisions.

About

Probability of default using Machine Learning in R


Languages

Language:R 100.0%