According to the CDC motor vehicle safety division, one in five car accidents is caused by a distracted driver. Sadly, this translates to 425,000 people injured and 3,000 people killed by distracted driving every year.
State Farm hopes to improve these alarming statistics, and better insure their customers, by testing whether dashboard cameras can automatically detect drivers engaging in distracted behaviors. Given a dataset of 2D dashboard camera images, State Farm is challenging Kagglers to classify each driver's behavior.
- Build image classification models to detect driver's behaviour. Predict the likelihood of what the driver is doing in each picture.
- Use ensemble methods to boost the performance of the models for different classes.
- Create a customized dataset to test the generality of models
All the training data used in this project are from State Farm Distracted Driver Detection Competition. This dataset has ten different classes (e.g. safe driving, texting, drinking, etc.). There are 22424 2D dashboard camera images in the dataset. A customized dataset was also used in this project to test the generality of models. The customized dataset contains 946 dashboard camera images for ten classes.
- Image Augmentation: Apply some data augmentation methods from
torchvision.transforms
(e.g. RandomPerspective, Normalize) on the train/val dataset. - Fine-Tuning Pretrained Models: Train the image dataset on different pretrained models (e.g. densenet121, EfficientNet_B0, mobilenet_v3_large).
- Ensemble Methods: Implement ensemble methods (e.g. Majority vote, Average likelihood, Weighted Majority) to boost the performance of the models.
- Evaluation: Evaluate the models on the test set and customized dataset. Display the predictions on images and make a demonstration video.
Model Name | Test Accuracy | Customized Dataset Accuracy |
---|---|---|
DenseNet (Without Data Augmentation) | 0.86862 | - |
EfficientNet B0 (Without Data Augmentation) | 0.94407 | - |
MobileNet V3 (Without Data Augmentation) | 0.90767 | - |
DenseNet | 0.89968 | 0.68 |
EfficientNet B0 | 0.96848 | 0.84 |
MobileNet V3 | 0.94673 | 0.8775 |
Ensemble Model (Majority Vote) | 0.96919 | 0.9825 |
Ensemble Model (Average Likelihood) | 0.97991 | 0.93 |
[1] Krizhevsky, A. (2014). One weird trick for parallelizing convolutional neural networks. arXiv preprint
arXiv:1404.5997
.[2] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
[3] Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500).
[4] Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., ... & Adam, H. (2019). Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1314-1324).
[5] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint
arXiv:1409.1556
.[6] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
This repository is licensed under the Apache-2.0 License - see the LICENSE file for details.