supervised-machine-learning xgboost-algorithm classification

Binary Classification and Credit Card Fraud

/Notebooks
One notebook with basic postgres usage and another with the bulk of the project in walkthrough format
streamlit_app.py
The guts of a streamlit app for fine-tuning the model using prediction threshold
helper_functions.py
Functions used in cleaning the data
card_fraud_predictions.pdf
The slides for the project presentation
app_preview.mov
A video preview of the streamlit app

Extras: Read the blog post.

Description

This repository contains a working model to predict credit card fraud based on a Kaggle dataset provided by the Vesta corporation. The final model produced is an XG Boost classifier model that predicts a binary of 1 for a fraudulent transaction and 0 for valid transaction.

Features and Target Variables

Target Variable: Fraud or Valid
Features: Matched information, timedelta, transaction amount, debit vs. credit, product code, general card information

Data Used

Vesta Corporation Transaction Information

Tools Used

PostgreSQL
XG Boost
Logistic Regression
Random Oversampler
SMOTE
Streamlit
Seaborn
Matplotlib

Potential Impact

Vesta Corporation put out this dataset to encourage data scientists to help with the fight against credit card fraud. In 2018, the worldwide cost of credit card fraud was over $24 billion. With this knowledge, I hope my work, or the work of other data scientists exploring this dataset, will be able to aid in the fight again fraudulent transactions.

Below is an image of the ROC curve from my final XGBoost model.

About

Metis Project 3: Supervised Machine Learning with a Categorical Target (predicting credit card fraud)

supervised-machine-learning xgboost-algorithm classification

Languages

Language:Jupyter Notebook 99.7%Language:Python 0.3%

josephpcowell / cowell_proj_3