ireneliu521 / Credit-Card-Fraud_J2D_Project_Python

Apply 7 common Machine Learning Algorithms to detect fraud, while dealing with imbalanced dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Credit Card Fraud Detection

In this project, we will analyze the dataset which contains 492 frauds out of 284,807 transactions from Kaggle (www.kaggle.com/mlg-ulb/creditcardfraud/data). The transactions were made by european credit card holders in September 2013. Our objective of this project is to fit the dataset into our machine learning models to predict precisely while dealing with the highly unbalanced issue of this dataset. Since there are 28 variables which are the result of a principle component analysis (PCA) transformation and the information of the variables was not given, we will drop the variables which have similar distributions. Our next step is to deal with the unbalanced issue. We will use the synthetic minority over-sampling technique (SMOTE) to resample the dataset to make the numbers of frauds and normal transactions even. The last step is to compare the machine learning methods and we found that Xgboost returned the highest AUC score.

About

Apply 7 common Machine Learning Algorithms to detect fraud, while dealing with imbalanced dataset


Languages

Language:Jupyter Notebook 100.0%