data-science data-science-projects decision-tree decision-tree-classifier joblib jupyter-notebook machine-learning numpy pandas python scikit-learn

Introduction

NyakaMwizi is a machine learning model built to detect potentially fraudulent transactions

The dataset used contains 1.3M instances and 23 features

How to test out the model

Ensure you have Python 3.11 and Git installed.

Open a terminal and run the following commands.

Set everything up.

Linux/Mac

git clone https://github.com/SenZmaKi/NyakaMwizi && cd NyakaMwizi && python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt

Windows (Command Prompt)

git clone https://github.com/SenZmaKi/NyakaMwizi && cd NyakaMwizi && python -m venv .venv && .venv\Scripts\activate && pip install -r requirements.txt

Test the model.

python test_model.py

Visual Insights

These are insights I gained as I was exploring the data-set with graphs and computations

They are in order of hierachy

Time

The time bracket under which the most fraudulent transactions occured is between 10:00PM and 4:00AM

Graph for frauds

Graph for non frauds

Amount

Contrary to what you'd expect, most fraudulent transactions didn't involve exorbitant amounts of money
Instead they involved both reasonably large amounts of money e.g 30k and average amounts of money

Graph for frauds

Graph for non frauds

Age

The age brackets that involved the most fraudulent transactions is 30 to 70
But the same can be said for non-fraudulent transactions so this insight may be a misinterpretation

Graph for frauds

Graph for non frauds

Longitude and latitude

Some areas on the scatter matrix seemed to experience more fraudulent transactions

Scatter matrix for frauds

Scatter matrix for non frauds

Job

Specific jobs experienced more fraudulent transactions e.g, job 300
But this behaviour is inline with what is observed with non-fraudulent transactions so it may also be another misinterpretation

Graph for frauds

Graph for non frauds

Final Model Performance

Model: DecisionTreeClassifier
Precision: 82.88%
Recall: 17.12%

About

A credit card fraud detection machine learning model

https://youtu.be/dQw4w9WgXcQ

data-science data-science-projects decision-tree decision-tree-classifier joblib jupyter-notebook machine-learning numpy pandas python scikit-learn

Languages

Language:Jupyter Notebook 99.4%Language:Python 0.6%

SenZmaKi / NyakaMwizi

Introduction

Table of Contents

How to test out the model

Visual Insights

Time

Graph for frauds

Graph for non frauds

Amount

Graph for frauds

Graph for non frauds

Categories

Graph for frauds

Graph for non frauds

Age

Graph for frauds

Graph for non frauds

Longitude and latitude

Scatter matrix for frauds

Scatter matrix for non frauds

Job

Graph for frauds

Graph for non frauds

Final Model Performance

About

Languages