car-accident austin random-forest neural-networks hyperparameter-optimization kfold-cross-validation sqlalchemy postgresql aws pipeline flask api pandas scikit-learn javascript html-css heroku-deployment shell dashboard

Austin Driver Score Predictor

Selected Topic

Analyzing motor vehicle accident data in Austin.

Using a variety of tools, we will look at how different factors may contribute to the severity of a car crash.

Reason for Selecting Topic

We want to build a model that will predict the severity of crash depending on different factors.
This information can be used by car insurance companies and by consumers who are trying to shop for cars keeping safety in mind.

Description of the Source Data

TxDot Crash Query System -- This database uses a multitude of factors to input details on a car accident including but not limited to:

Weather, longitutde, latitutde, severity, time and date of accident over the course of many years.
For the sake of our analysis, we will only use data from 2018-2020. Additionally, we used NHTSA WebAPIs - https://one.nhtsa.gov/webapi/Default.aspx?SafetyRatings/API/5
This webAPI gives access to the New Car Assessment Program - 5 Star Safety Rating of the US Department Of Transportation, to retrieve overall safety ratings of the cars involved in the crashes.

Questions we hope to answer with the data

How do different cars perform in terms of frequency and severity of car accidents?
How do different weather types affect the frequency of car accidents?
How demographics affect frequency of car accidents?
Where do most accidents occur in Austin?

Communication Protocols

In order to keep updated on the status of each of our parts of the project, we message each other regularly through Slack and organized regular zoom meetings.

Tools

Creating Database
- PostgreSQL
- Amazon Web Services (AWS)
Connecting to Database
- Psycopg2
Analyzing Data
- Pandas
Machine Learning
- Imbalanced-learn
- Scikit-Learn
- Tensorflow
Dashboard
- Tableau
- Javascript
- Flask
- HTML
- CSS
- Heroku

Machine Learning Model

The preliminary data includes columns that describe the environment for each crash that took place in Austin, TX. These features include the weather condition, crash severity, day of the week, vehicle make and model, etc.
An ERD showcasing the inter-relationships between each of the features from the different datasets can be found here.
After connecting to the database, we printed out the header for each column to see all of the features available. From that list, we chose the features that we believed would have the highest correlation with crash severity.
The data was split into training and test data using the train_test_split function. We used the default 75% to 25% split.
After careful analyzing, it was determined that the linear models only yielded about 50% correlation. Altering the parameters, such as increasing max iterations and n_jobs, to these did not increase the accuracy. Neural network model was then used to see if it would have a higher accuracy rate. After adding 8 layers (using Relu, Swish and Sigmoid), the accuracy rate was still at 54%, with 69% loss. This means our model could only accurately predict the outcome of the severity of a crash about 50% of the time.
We decided to use the decision tree model for our machine learning model. We grouped our crash severity data into two categories, 0 - no injury, and 1 - injury. The benefit of this model is that it can be used to predict our binary outcome. The downside of this model is that if we choose to group our crash severity data differently (the data is grouped into 5 classifications: no injury, possible injury, non-incapacitating injury, severe injury, and fatal injury), we will not be able to use the decision tree model.

Presentation

Our presentation can be found here Google Slide Presentation

Dashboard

We used Tableau as a part of our dashboard. Our Tableau analysis can be found here Tableau Dashboard.
The other part of our dashboard is an interactive webpage using machine learning to calculate a driver score. It includes an interactive element, users are able to select data that pertains to them (age, type of car, etc.) and click a button that will give the a risk score.
The link to the dashboard repository is Link Dashboard Repo.
The link to the deployed dashboard is https://austin-driver-score.herokuapp.com/.
Dashboard Live Demo

About

Used Python Scikit-Learn to analyze Austin car crash data from 2018 to 2020 and created an interactive dashboard using a Random Forest Classifier algorithm to calculate a driver score from user features.

https://cedric-lutonda-driver-score.herokuapp.com/

car-accident austin random-forest neural-networks hyperparameter-optimization kfold-cross-validation sqlalchemy postgresql aws pipeline flask api pandas scikit-learn javascript html-css heroku-deployment shell dashboard

Languages

Language:Python 100.0%