You are a contract data scientist/consultant hired by a new e-commerce site to try to filter out fraudsters. The company oursources all data science work, so you must properly scope and present your solution to the manager before you embark on your analysis. You will also need to build a sustainable software project that you will devliver to the company engineers to deploy your model in the cloud. Since others will potentially use/extend your code you NEED to properly encapsulate your code and leave plenty of comments.
- The Team's slideshow presentation can be found here
- For an understanding of the project, see the overview provided by Galvanize
-
We applied Gradient Boosting to predict the probability of any event being fraud
-
As a baseline, we attempted predicting fraud probability based on total fraud rate.
- This resulted in a log loss of 0.33
-
Our model achieved a log loss of 0.06
-
Predicted chance of fraud Thresholds
- Low: up to 0.5%
- Med: 0.5% to 70%
- High: x > 70%
-
Fraud detection
- Medium Threshold:
- Model detects 99% of fraud
- 44% FPR
- High Threshold:
- Model Detects 78% of Fraud
- 0.4% FPR
- Medium Threshold:
Fraud Detection in Event Postings (Galvanize g88 - Spring 2019)