Lighthouse-Reports / suspicion_machine

Fraud detection related data and scripts to share with partners.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suspicion Machine

This project is part of the Surveillance Newsroom at Lighthouse Reports. Other projects from Lighthouse Reports can be found at the main GitHub repo.

Project Overview

This project contains the source code for an investigation into the use of a welfare fraud prediction algorithm by the city of Rotterdam. The model used by Rotterdam is a Gradient Boosting Machine (GBM) trained on a dataset of 12,707 past investigations into welfare fraud. This project contains source code for and results of the experiments conducted to explore and identify potential bias in the GBM model.

Reporting Partners

  • WIRED
  • Vers Beton
  • Follow the Money
  • Argos

Methods Used

  • Statistical Parity
  • Inferential Statistics
  • Data Wrangling
  • Data Visualisation

Technologies

  • R
  • Jupyter Notebook

Project Description

This project explores the impact of beneficiaries' attributes on their risk scores. The GBM model evaluates the welfare beneficiaries based on 315 features and assigns each applicant a risk score from [0, 1], where higher risk score indicate greater risk of having committed some kind of 'illegality'. Illegality encompasses everything from simple mistakes on a form to serious fraud. This project uses data obtained through press questions and public records requests from Rotterdam Municipality and employs the following approaches:

  • Testing whether the fraud prediction system violates Statistical Parity and Conditional Statistical Parity
  • Synthetic data generation for replication, since the original training data cannot be shared publicly due to GDPR concerns
  • Extraction of the model's decision trees

Repo Overview

  • Follow setup instructions to reproduce results and create your own experiments
  • Raw Synthetic Data is kept here
  • Synthetic Data Generation model is kept here
  • Experiment results are kept here
  • The trained model file is kept here
  • The source code used to train the model is here

Notebooks

Contributing Members

Justin-Casimir Braun | https://github.com/jusbraun | justin-casimir@lighthouseports.com

Htet Aung | https://github.com/NecklessCage | htet@ligthhousereports.com

Gabriel Geiger | https://github.com/gheghi18 | gabriel@lighthousereports.com

Eva Constantares | eva@lighthousereports.com

About

Fraud detection related data and scripts to share with partners.

License:GNU General Public License v3.0


Languages

Language:Jupyter Notebook 96.4%Language:R 3.6%