anchalbhalla / mortgage-default-prediction-icp4d

This is a demo for the IBM Cloud Pak for Data. This project demonstrates the various capabilities of IBM Cloud Pak for Data which is an end to end analytics tool. It shows how a bank can use this demo to predict whether a customer will default a mortgage or not in the future

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mortgage-default-prediction-icp4d

This is a demo for the IBM Cloud Pak for Data. This project demonstrates the various capabilities of IBM Cloud Pak for Data which is an end to end analytics tool. It shows how a bank can use this demo to predict whether a customer will default a mortgage or not in the future

Collect

In collect part data connections need to be created. A Data connection was created for Db2 Warehourse on Cloud. To do that the following was done:

Steps
  1. Create a analystics project
  2. Create a data source
  3. Add host, username, password, DB name
  4. Test connection
  5. Add connection

Organize

This is the task of a data engineer. In this step data discovery will be performed, business glossary terms will be added and governence rules and policies are added. Data lineage will be also be demonstrated with this.

Data Discovery

Used to analyse the data quality and assign terms to the datasets and project.

Steps
  1. Add the connection
  2. Select analyse data quality
  3. Select assign terms
  4. Click Discover
  5. Assign the terms from the appropriate category

Business Glossary

Create a business glossary understood by everyone

Steps
  1. Create a category
  2. Create a terms
  3. Save
  4. Repeat 2 to 3 until done

Goverence

Compliance to rules and policies of the industry is very important for every business

Data Lineage

Data lineage shows the interconnection between all terms, rules and the datasets. It can seen how the terms and rules are linked to the appropriate dataset.

Analyse

Visualizations

Visualizations explain the business problem in hand.

Insights gained:

  1. Showing how many people have applied for loans online
  2. Alot of people reside in the Sharjah and Dubai area
  3. Insights - People with higher income default the loan more
  4. Insights - People buying houses with higher sales price will deafault the loan more

Modelling - SPSS

Various models were tried and tested using SPSS modeler on IBM Cloud Pak for Data. But the best one was Random Forest and it gave accuracy of 90%.

Modelling - Notebook

A notebook will have to be created for this project as well since open scale currently works with Jupyter Notebooks only.

Infuse - Open Scale

To avoid biases in the model, open scale on IBM cloud PAK was used. It gives the bank employee the explainabilty required for reporting and reasoning to customers.

Open Scale on IBM Cloud PAK - Explainability

Open Scale in action (On the application)

Front-End Application - Shiny App

The front end of this project was created using a shiny app. The model API end point needs to be added which will make a call to the models and then display the results in a graph form as seen in the gif below.

About

This is a demo for the IBM Cloud Pak for Data. This project demonstrates the various capabilities of IBM Cloud Pak for Data which is an end to end analytics tool. It shows how a bank can use this demo to predict whether a customer will default a mortgage or not in the future


Languages

Language:Jupyter Notebook 83.3%Language:R 16.3%Language:CSS 0.4%