pepecura/ml-on-gcp-fraud-detection-hybrid-approach

Fraud Detection in Financial Transactions

In this notebook, we will work on a simulated transactions data for fraud detection. We will start with the analysis of fraud and create new features, then run machine learning models with BigQuery ML and Vertex Tables and evaluate the performance. Finally we will run the ML model to score a test dataset and export the predictions to BigQuery to take actions. The public data used in the exercise is downloaded from the Kaggle website: https://www.kaggle.com/ealaxi/paysim1

Steps we follow:

Download data
Load data
Explore data
Prepare data
Build an unsupervised model using k-means in BigQuery ML with anomaly detection feature
Build supervised models in BigQuery ML and Vertex Tables
Batch prediction
Combining supervised and unsupervised model results for a hybrid approach to take action

K-means model in BigQuery ML

Vertex Tables model for Fraud Detection

Conclusion: The high scores from supervised model are selected for fraud investigation (95% of the selected group are fraudulent and this group includes 90% of the overall fraudulent transactions). In the final step, we analyse the observations that have slightly lower scores but show abnormal behaviour in the unsupervised model. Majority of those transactions are also seen as fraudulent so the number of false negatives are decreased with this hybrid approach.

About

Detection of fraudulent transactions with Machine Learning in a hybrid approach using both supervised and unsupervised methods on GCP.

Languages

Language:Jupyter Notebook 100.0%