caesarmario / data-warehouse-credit-card-applicant-using-pentaho

This repository contains OLTP, ETL process (using Pentaho Data Integration), and OLAP of credit card dataset. The dataset is taken from Kaggle (https://www.kaggle.com/rikdifos/credit-card-approval-prediction) and part of author Capstone Project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

๐Ÿ’ณ Data Warehouse Credit Card Applicant ๐Ÿ’ณ

using Pentaho Data Integration (PDI)/Kettle and Microsoft SQL Server 18 โš™

.: ๐Ÿ“„ Dataset taken from Kaggle :.


Star Badge


๐Ÿ“ƒ Table of Contents:


๐Ÿ–‹ About Project

  • This repository contains files to create data warehouse such as:

    • ETL files using Pentaho Data Integration (PDI)
    • Codes to create OLAP (SQL)
    • Codes to select data from OLTP (SQL)
    • Codes to perform random testing (SQL)

    for credit card applicant. The dataset is provided by Seanny (rikdifos).

  • This project will also create:

    • 2 dimension tables (Applicant_Dimension and CreditRecord_Dimension),
    • Time dimension (Time_Dimension), and
    • 1 fact table (CreditCard_Fact).

    using PDI and Microsoft SQL Server 18.

๐Ÿ“Œ Objectives

  • Perform ETL using PDI for both datasets.
  • Create time dimension using PDI.
  • Create fact table using PDI.

๐Ÿงพ Data Set Description

  • The dataset description can be seen here.

๐Ÿ”Œ Connection Configuration

username: 	sa
pass: 		qwer

๐Ÿ“€๐Ÿ”Œ OLTP Configuration

OLTP Config

๐Ÿ’ฟ๐Ÿ”Œ OLAP Configuration

OLAP Config



โš™ ETL Process

๐Ÿ‘จโ€๐Ÿ’ผ Application Record

Application

โ–ถ Table Input Configuration

Table Input - Application

  • Importing application table from OLTP.

โ–ถ Sort Rows Configuration

Sort Rows - Application

  • Sort data based on applicant ID.

โ–ถ Unique Rows Configuration

Unique Rows - Application

  • Filter duplicate applicant ID.

โ–ถ Replace in String Configuration

Replace in String - Application

  • Replace some values to make it easier to understand.

โ–ถ Add Constants Configuration

Add Constants - Application

  • Add new columns with constant date (October 1, 2021).

โ–ถ Calculator Configuration

Calculator - Application

  • Calculate DOB and date of applicant start working based on current date (October 1, 2021).
  • Calculate age of applicant based on current year (2021).

โ–ถ Filter Rows Configuration

Filter Rows - Application

  • Filter applicant data which has null values.
  • Filter applicant data who is less than 21 y.o.

โ–ถ Add Sequence Configuration

Add Sequence - Application

  • Adding Index Applicant (to replace ID as primary key).

โ–ถ Select Values Configuration

Select Values - Application

  • Select columns that will entered OLAP.

โ–ถ Table Output Configuration

Table Output - Application

  • Exporting application table to OLAP (Application Dimension).



๐Ÿ’ถ Credit Record

Credit Record

โ–ถ Table Input Configuration

Table Input - Credit Record

  • Importing credit record table from OLTP.

โ–ถ Sort Rows Configuration

Sort Rows - Credit Record

  • Sort data based on applicant ID.

โ–ถ Add Constants Configuration

Add Constants - Credit Record

  • Add new columns with constant date (October 1, 2021).

โ–ถ Calculator Configuration

Calculator - Credit Record

  • Calculate loan payment's month based on current date (October 1, 2021).

โ–ถ Add Sequence Configuration

Add Sequence - Credit Record

  • Adding CreditRecord_ID (to replace Applicant ID as primary key).

โ–ถ Select Values Configuration

Select Values - Credit Record

  • Select columns that will entered OLAP.

โ–ถ Table Output Configuration

Table Output - Credit Record

  • Exporting application table to OLAP (Credit Record Dimension).



โŒš Time Dimension

Time

โ–ถ Generate Rows Configuration

Generate Rows - Time

  • Generate a column with specific date (January 1, 2016).

โ–ถ Add Sequence Configuration

Add Sequence - Time

  • Add row with sequence from 1 to 99999.

โ–ถ Calculator Configuration

Calculator - Time

  • Caluclating start date with sequence data to make next date (ex: January 2, 2016; January 3, 2016)
  • Creating new columns (Day, Months, and Year).

โ–ถ Data Grid Configuration

Data Grid - Time_1
Data Grid - Time_2

  • Creating month number and month name.

โ–ถ Stream Lookup Configuration

Stream Lookup - Time

  • Combine 'Month' from Calculator node to 'No_Month' from Data Grid node.

โ–ถ Modified JavaScript Value Configuration

Modified JavaScript - Time

  • Creating time ID using JavaScript code.

โ–ถ Select Values Configuration

Select Values - Time

  • Select columns that will entered OLAP.

โ–ถ Table Output Configuration

Table Output - Time

  • Exporting time dimension to OLAP.



๐Ÿ’ณ Credit Card Fact

Credit Fact

โ–ถ Table Input (Credit Record) Configuration

Table Input CR - Fact

  • Importing Credit Record dimension from OLAP.

โ–ถ Table Input (Application) Configuration

Table Input Application - Fact

  • Importing Application dimension from OLAP.

โ–ถ Stream Lookup 1 Configuration

Stream Lookup 1 - Fact

  • Join both dimension tables based on applicant ID.

โ–ถ Filter Rows Configuration

Filter Rows - Fact

  • Filter applicant ID that doesn't exists in both tables.

โ–ถ Table Input (Time) Configuration

Table Input Time - Fact

  • Importing Time dimension from OLAP.

โ–ถ Stream Lookup 2 Configuration

Stream Lookup 2 - Fact

  • Join application & credit record dimension with time dimension.

โ–ถ Replace in String 1 Configuration

Replace in String 1 - Fact

  • Replace C, X, 0 with 'Good Debt' (C: loan for that month is already paid; X: no loan for that month; 0: loan is 1 to 29 days overdue).
  • Replace 1, 2, 3, 4, 5 with 'Bad Debt' (1: loan is 30 to 59 days overdue; 2: loan is 60 to 89 days overdue; 3: loan is 90 to 119 days overdue; 4: loan is 120 to 149 days overdue; 5: loan is more than 150 days overdue)

โ–ถ Calculator Configuration

Calculator - Fact

  • Creating 2 copies from 'Status' column ('Good_Debt' and 'Bad_Debt').

โ–ถ Replace in String Configuration

Replace in String - Fact

  • Good_Debt: Good Debt will be change to 1, while Bad Debt will be change to 0
  • Bad_Debt: Good Debt will be change to 0, while Bad Debt will be change to 1

โ–ถ Get System Info Configuration

Get System Info - Fact

  • To create date & time when ETL was performed.

โ–ถ Select Values Configuration

Select Values - Fact

  • Select columns that will entered OLAP.

โ–ถ Table Output Configuration

Table Output - Fact

  • Exporting fact table to OLAP.



โญ Star Schema

  • Star schema generated using Power BI Star Schema



๐Ÿ‘€ Before & After ETL Comparison

  • This section will show the data structure before & after ETL.

๐Ÿ‘จโ€๐Ÿ’ผ Application Record

Applicant_1
Applicant_2

๐Ÿ’ถ Credit Record

Credit Record

โŒš Time Dimension

Time Dimension

๐Ÿ’ณ Credit Card Fact

Credit Card Fact



๐Ÿ™Œ Support me!

๐Ÿ‘‰ If you find this project useful, please โญ this repository ๐Ÿ˜†!


๐Ÿ‘‰ More about myself: here

About

This repository contains OLTP, ETL process (using Pentaho Data Integration), and OLAP of credit card dataset. The dataset is taken from Kaggle (https://www.kaggle.com/rikdifos/credit-card-approval-prediction) and part of author Capstone Project.