In this notebook, we will work on a simulated transactions data for fraud detection. We will start with the analysis of fraud and create new features, then run machine learning models with BigQuery ML and Vertex Tables and evaluate the performance. Finally we will run the ML model to score a test dataset and export the predictions to BigQuery to take actions. The public data used in the exercise is downloaded from the Kaggle website: https://www.kaggle.com/ealaxi/paysim1
- Download data
- Load data
- Explore data
- Prepare data
- Build an unsupervised model using k-means in BigQuery ML with anomaly detection feature
- Build supervised models in BigQuery ML and Vertex Tables
- Batch prediction
- Combining supervised and unsupervised model results for a hybrid approach to take action
Conclusion: The high scores from supervised model are selected for fraud investigation (95% of the selected group are fraudulent and this group includes 90% of the overall fraudulent transactions). In the final step, we analyse the observations that have slightly lower scores but show abnormal behaviour in the unsupervised model. Majority of those transactions are also seen as fraudulent so the number of false negatives are decreased with this hybrid approach.