YahyaKamel/Prediction-of-probability-of-default-using-Machine-Learing-in-R

data-science machine-learning r probability-of-default expected expected-credit-losses ifrs-9

Probability of default in R

Yahya Kamel

12 January 2022

https://www.linkedin.com/in/yahya-kamel-5653b24b/

Harvard University

Data science capstone project

The objective of this project is to use Machine Learning “ML” prediction models to predict the Probability of Default “PD” of retail loans. This prediction process is very vital for companies and institutions dealing with receivables and loans. Through this project, we will get hands on different models and techniques to get more insight from the prediction models. This project explores the following ML prediction models:

Logistic regression with its different branches (log, logit, cloglog, probit)
K Nearest Neighbors non-linear regression
Vector Support Machines - Radial
Neural networks
Naive Bayes
Decision tree
Random forest This paper is divided in the following sections: 1 Data exploration 2 Data wrangling 3 Modeling 4 Final PDs and conclusion

The underlying dataset contains the following data about around 29 thousand of retail borrowers:

loan_status: 0 if borrower has defaulted and 1 if borrower has not defaulted.
loan_amt: Total amount of loan
int_rate: Interest rate applicable to the loan
grade: Credit risk rating. A is the lowest in risk and G is the highest in risk.
annual_inc: Annual income of the borrower.
emp_length: Number of years of work.
home_ownership: Rent/Mortage/Own/Others.
age: Borrower’s age.

The expected time to run the code is 25 – 45 minutes, depending on the machine speed.

Disclaimer: The information, estimates, code, and conclusions mentioned in this paper are only for educational purposes and should not be relied upon for any decisions.

About

Probability of default using Machine Learning in R

data-science machine-learning r probability-of-default expected expected-credit-losses ifrs-9

Languages

Language:R 100.0%