cromano8 / CMPD_Traffic_Stops

Charlotte, NC traffic stops for 2020-2021. ML modeling attempting to predict outcomes of traffic stops and interactive visualizations for EDA purporses.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Predicting Charlotte Traffic Stop Outcomes

Research Questions

  1. Which of the general attributes correlate the most with the outcome of the traffic stop (i.e. search conducted, verbal warning, written warning, citation issued, no action, arrest).
  2. What driver attributes (race, ethnicity, gender, age) correlate the most with the outcome of the traffic stop?
  3. What officer attributes (race, gender, years of service) correlate the most with the outcome of the traffic stop?

Findings

  • Driver_Race was a relevant predictor variable for our model, but had lower relevance than anticipated. Chi-Squared
  • The Fairness Metric had a minor positive effect on model performance but was limited by the Driver_Race variable's low correlation p-percent
  • Final model performance was lower than expectations/desires
  • Through EDA and comparison with 3rd party local demographic and income data, there is likely traffic discrimination that is not captured by the model due to sampling bias of a dataset that only contains driver's who have already been stopped. This may help explain the lower than expected relevance of Driver_Race variable. ![traffic stops by race](image

Steps and Approaches

Preprocessing/clean the dataset:

  • Check for missing values
  • Consistency (spelling, etc.)
  • Skewness → normalization

Identify variables to be used. EDA

Identify most appropriate models to use

  • Multi class prediction. 5 different outcomes.
  • Binary. Search conducted or not
  • Sklearn fairness metrics

Analysis

  • Naive/Gaussian Bayes
  • Decision Tree/Random Forest
  • Logistic regression
  • Parameter tuning

All modeling efforts can be found within the Preprocessing_and_Modeling folder

Create Streamlit App

Streamlit app to make the dataset easily accessible for anyone. Preliminary EDA efforts are made available here. Source code and files are located in Streamlit folder to be able to run the app from an IDE rather than going to the link provided.

Create R Shiny App

R Shiny app to make the dataset easily accessible for anyone here. The app contains EDA insights for the dataset as well as comparing the location of the traffic stop to local population demographics and income. Source code and files for the R Project are located in R_Shiny folder to be able to run the app from Rstudio rather than going to the link provided. To do so,

  • Download all items in the repository as a zip file by clicking 'Code' button above
  • Unzip the downloaded foler
  • Open the folder "R_Shiny"
  • Open CMPD_Traffic_Stops_Shiny.R
  • Click "Run App" in the top-right corner of the code panel

Create Tableau dashboard

Tableau Public dashboard to concisely show EDA on the dataset, while also identifying main trends present within the data. The dashboard can be found here

Import Endnotes

We realize that by analyzing this dataset, we could shed light on a potentially controversial topic, that is, how the race/ethnicity/gender of the driver/officier can help to predict the outcome of a traffic stop.

If this is a finding, it should not be used to guide police targeting, but rather to illuminate bias in traffic stops. Note that being able to predict the outcome of a traffic stop based on race, ethnicity or gender is inherently unethical/discriminatory. Ideally, traffic stop outcomes based on these characteristics should be proportional to the demographic population of the area under observation.

Regardless of our findings, we would like to be explicit about the fact that none of our findings are causal. Rather they shed light on correlations in the data that may be used to dismantle bias in policing.

Data Source

Data is available at the Charlotte Data Portal.

This project was the result of two courses in the Data Science & Business Analytics master's program at UNC-Charlotte. The modeling and streamlit app were completed in Minwoo Lee's Applied Machine Learning class alongside Mitchell Jones, Srikar Vavilala, Jordan Register, and Marianna Shaver. The R Shiny app was created in Chase Romano's Visual Analytics class alongside Joseph Burnes and Syed Muhammad Suffwan. The tableau dashboard is my own also created in Visual Analytics.

About

Charlotte, NC traffic stops for 2020-2021. ML modeling attempting to predict outcomes of traffic stops and interactive visualizations for EDA purporses.


Languages

Language:Python 58.0%Language:R 42.0%