Deep Understanding of Traffic Scenes for Autonomous Driving

In this project we leverage state-of-the-art deep neural networks architectures for image classification, object recognition and semantic segmentation to implement a framework that aids autonomous driving by understanding the vehicle's surrounding scene.

Authors

V1P - V1sion emPowering

Name	E-mail	GitHub
Arantxa Casanova	ar.casanova.8@gmail.com	ArantxaCasanova
Belén Luque	luquelopez.belen@gmail.com	bluque
Anna Martí	annamartiaguilera@gmail.com	amartia
Santi Puch	santiago.puch.giner@gmail.com	santipuch590

Report

A detailed report about the work done can be found in this Overleaf project.

Additionally, a Google Slides presentation can be found in this link.

DNN weights

HDF5 weights of the trained deep neural networks can be found here.

Datasets analysis

Prior to all experiments for each problem type (classification, detection and segmentation) we have performed an analysis of the datasets to facilitate the interpretation of the results obtained.

How to use the code

See this README for instructions on how to run the experiments and utilities.

Object recognition

In order to choose a good-performing object recognition network for our system, we have tested several CNNs with different architectures: VGG (2014), ResNet (2015) and DenseNet (2016). These networks have been both trained from scratch and fine-tuned using some pre-trained weights. The experiments have been carried out using different datasets: TT100K classsification dataset and BelgiumTS dataset for traffic sign detection, and KITTI Vision Benchmark for cars, trucks, cyclists and other typical elements in driving scenes. Finally, we have tuned several parameters of the architectures and the training process in order to get better results.

Contributions to the code

models/denseNet_FCN.py - adaptation of this implementation of DenseNet to the framework and generalization of the axes of the batch normalization layers, which was only working correctly for Theano.
models/resnet.py - adaptation of the resnet50 Keras model to the framework and included L2 regularization for the weights (not included in Keras Applications)
models/vgg.py - changed implementation to include L2 regularization for the weights (not included in Keras Applications)
callbacks/callbacks.py and callbacks/callbacks_factory.py - implemented a new callback, LRDecayScheduler, that allows the user to decay the learning rate by a predefined factor (such that lr <-- lr / decay_factor) at specific epochs, or alternatively at all epochs.
analyze_datasets.py - analyzes all the datasets in the specified folder by counting the number of images per class per set (train, validation, test), and creates a CSV file with the results and a plot of the (normalized) distribution for all sets.
optimization.py - automatically generates the config files for the optimization of a model, using a grid search, and launches the experiments.
run_all.sh - bash script to launch all the experiments in this project, including object recognition, object detection and semantic segmentation.

Milestones

VGG:

Analyze dataset - We extracted a CSV, statistical conclusions and plots of the classes distributions in the dataset(TT100K_TrafficSigns). Plots and comments in the report.
Train from scratch using TT100K.
Comparison between crop and resize.
Evaluate different pre-processings in the configuration file: subtracting mean and std feature-wise.
Transfer learning from TT100k dataset to Belgium dataset
Train from scratch and fine-tune VGG with KITTI dataset

ResNet:

Implement it and adapt it to the framework
Train from scratch with TT100K dataset
Fine-tuning from ImageNet weights with the TT100K dataset
Fine-tuning from ImageNet weights with the KITTI dataset
Compare fine-tuning vs train from scratch

DenseNet:

Implement it and adapt it to the framework
Train from scratch with TT100K dataset

Boost performance

Grid-search to search hyperparams for ResNet
Refined ResNet fine-tuning over ImageNet weights to boost the performance on TT100K dataset
Implemented LR decay scheduler, that has proved to be helpful in improving the performance of the networks
Try data augmentation and different parameters on DenseNet

Object detection

For object detection we have considered two single-shot models: the most recent version of You Only Look Once (YOLO) together with its smaller counterpart, Tiny-YOLO, and Single-Shot Multibox Detector (SSD). The first two have been trained by fine-tuning the pre-trained ImageNet weights, while the latter has been trained from scratch. All these models have been trained to detect a variety of traffic signs in the TT100K detection dataset and to detect pedestrians, cars and trucks in the Udacity dataset.

Contributions to the code

models/ssd300.py, ssd_utils.py and metrics.py - adaptation of this implementation of SSD300 to the framework, including the loss and batch generator utilities required to train it.
analyze_datasets.py - extended functionality to analyze detection datasets and report distributions over several variables.
eval_detection_fscore.py - extended to evaluate the SSD model. Included options to control the detection and NMS thersholds. Added option to store the predictions for the first image in each processed chunk. Generalized the script to ignore specific classes, so that they are not taken into account when computing the metrics.

Milestones

YOLO:

Fine-tune from ImageNet weights on TT100K detection dataset
Fine-tune from ImageNet weights on Udacity dataset
Evaluate performance on TT100K detection dataset
Evaluate performance on Udacity dataset

Tiny YOLO:

Fine-tune from ImageNet weights on TT100K detection dataset
Fine-tune from ImageNet weights on Udacity dataset
Evaluate performance on TT100K detection dataset
Evaluate performance on Udacity dataset
Compare results and performance between TinyYOLO and YOLO

SSD:

Implement it and adapt it to the framework
Train from scratch on TT100K detection dataset
Train from scratch on Udacity dataset
Evaluate performance on TT100K detection dataset
Evaluate performance on Udacity dataset

Dataset Analysis

Analyze TT100K detection dataset: distribution of classes, bounding boxes' aspect ratios and bounding boxes' areas per dataset split
Analyze Udacity dataset: distribution of classes, bounding boxes' aspect ratios and bounding boxes' areas per dataset split
Assess similarities and differences between splits on Udacity dataset

Boost performance

Fine-tune Tiny YOLO from baseline weights on TT100K detection
Fine-tune Tiny YOLO and use preprocessing and data augmentation techniques to overcome the differences in dataset splits in Udacity dataset, thus improving the performance of the model on this dataset

Semantic Segmentation

For the semantic segmentation task, we have implemented and tested SegNet, DeepLabv2, Multi-Scale Context Aggregation by Dilated Convolutions and Tiramisu. We also compare the results with FCN8.

Contributions to the code

models/segnet.py - implementation from scratch of both the vgg and basic version, following the original paper and the Caffe Segnet code and Caffe Segnet basic code
models/deeplabV2.py - adaptation of this implementation of DeepLabv2 to the framework and included L2 regularization for the weights
models/tiramisu.py - implementation based on the Theano / Lasagne code from the original paper
models/dilation.py - adaptation of this implementation
initializations/initializations.py - added Identity initialization
analyze_datasets.py - extended the implementation to analyze segmentation datasets

Milestones

FNC8:

Read paper
Train network on CamVid dataset
Train network on CityScapes dataset
Evaluate performance on CamVid dataset
Evaluate performance on CityScapes dataset

Segnet:

Read paper
Implement network in the framework (vgg and basic version)
Train network on CamVid dataset
Boost performance
Evaluate performance on CamVid dataset

DeepLabv2:

DilatedNet:

Tiramisu:

Dataset Analysis

Analyze the distribution of classes of all data splits for all the available segmentation datasets: Camvid, cityscapes, KITTI, Pascal2012, Polyps and Synthia cityscapes.

Experimental results

Prior to choosing our final system we have carried out several experiments using different architectures, different parameters and different datasets. A summary of the experiments done can be found here.

References

[1] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.

[2] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller,J. Zhang, X. Zhang, J. Zhao, and K. Zieba. End to End Learning for Self-Driving Cars. arXiv:1604.07316 [cs], Apr. 2016. arXiv: 1604.07316.

[3] C. Chen, A. Seff, A. Kornhauser, and J. Xiao. Deepdriving: Learning affordance for direct perception in autonomous driving. In The IEEE International Conference on Computer Vision (ICCV), December 2015.

[4] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv:1606.00915 [cs], June 2016. arXiv: 1606.00915.

[5] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

[6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. SUMMARY

[7] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Densely Connected Convolutional Networks. Aug. 2016. arXiv: 1608.06993.

[8] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. arXiv preprint. arXiv:1611.09326, 2016.

[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012.

[10] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg. SSD: Single Shot MultiBox Detector. arXiv:1512.02325 [cs], 9905:21–37, 2016. arXiv:1512.02325.

[11] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.

[12] M. Mathias, R. Timofte, R. Benenson, and L. Van Gool. Traffic sign recognition – how far are we from the solution? International Joint Conference on Neural Networks (IJCNN), 2013.

[13] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. CoRR, abs/1506.02640, 2015. SUMMARY

[14] J. Redmon and A. Farhadi. YOLO9000: better, faster, stronger. CoRR, abs/1612.08242, 2016.

[15] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497 [cs], June 2015. arXiv: 1506.01497.

[16] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Sept. 2014. arXiv: 1409.0575.

[17] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. SUMMARY

[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR), 2015.

[19] F. Yu and V. Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv:1511.07122 [cs], Nov. 2015. arXiv: 1511.07122.

[20] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu. Traffic-sign detection and classification in the wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

santiago-puchginer-snkeos / vr-project

Deep Understanding of Traffic Scenes for Autonomous Driving

Authors

V1P - V1sion emPowering

Report

DNN weights

Datasets analysis

How to use the code

Object recognition

Contributions to the code

Milestones

Object detection

Contributions to the code

Milestones

Semantic Segmentation

Contributions to the code

Milestones

Experimental results

References

About

Languages