Binary Classification and Credit Card Fraud
Contents
- /Notebooks
One notebook with basic postgres usage and another with the bulk of the project in walkthrough format - streamlit_app.py
The guts of a streamlit app for fine-tuning the model using prediction threshold - helper_functions.py
Functions used in cleaning the data - card_fraud_predictions.pdf
The slides for the project presentation - app_preview.mov
A video preview of the streamlit app
Extras: Read the blog post.
Description
This repository contains a working model to predict credit card fraud based on a Kaggle dataset provided by the Vesta corporation. The final model produced is an XG Boost classifier model that predicts a binary of 1 for a fraudulent transaction and 0 for valid transaction.
Features and Target Variables
- Target Variable: Fraud or Valid
- Features: Matched information, timedelta, transaction amount, debit vs. credit, product code, general card information
Data Used
Tools Used
- PostgreSQL
- XG Boost
- Logistic Regression
- Random Oversampler
- SMOTE
- Streamlit
- Seaborn
- Matplotlib
Potential Impact
Vesta Corporation put out this dataset to encourage data scientists to help with the fight against credit card fraud. In 2018, the worldwide cost of credit card fraud was over $24 billion. With this knowledge, I hope my work, or the work of other data scientists exploring this dataset, will be able to aid in the fight again fraudulent transactions.
Below is an image of the ROC curve from my final XGBoost model.