josecruzado21 / credit_card_fraud_detection

This project is about detecting fraudulent credit card transactions. The dataset tends to be highly imbalanced, with less than 0.2% of the observations labelled as fraudulent. To address this issue we have to take into account the bank's objective (maximizing precision or recall) and restrictions. The performance and efficiency of many classification algorithms (Logistic Regression, XGBoost, Random Forests) were tested and compared.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fraud Detection Algorithm

The data used in the following notebook was obtained from Kaggle (Dal Pozzolo et al 2015). Link to the dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud/data

According to the dictionary, to protect the identity, the variables in the data set are the consequence of a dimensionality reduction process (PCA). The time variable represent the number of seconds elapsed between the transaction and the first transaction in the dataset.

About

This project is about detecting fraudulent credit card transactions. The dataset tends to be highly imbalanced, with less than 0.2% of the observations labelled as fraudulent. To address this issue we have to take into account the bank's objective (maximizing precision or recall) and restrictions. The performance and efficiency of many classification algorithms (Logistic Regression, XGBoost, Random Forests) were tested and compared.


Languages

Language:Jupyter Notebook 100.0%