autonomous-vehicles autonomous-driving udacity-self-driving-car resnet-50 transfer-learning cnn-lstm conv3d transformer attention semantic-segmentation unet-image-segmentation unet setr multihead-attention steering-angle-prediction

AutoTruckX

An experimental project for autonomous vehicle driving perception with steering angle prediction and semantic segmentation.

Semantic Segmentation

Detailed description can be found at ./Semantic Segmentation/README.md.

SETR: A pure transformer encoder model and a variety of decoder unsampling models to perform semantic segmentation tasks. This model was adapted from and implemented based on the paper published in December 2020 by Sixiao Zheng et al., titled Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In particular, the SETR-PUP and SETR-MLA variants, that is, the models with progressive upsampling and multi-level feature aggregation decoders, are selected and implemented based on their state-of-the-art performance on benchmark datasets.
TransUNet: A UNet-transformer hybrid model that uses UNet to extract high-resolution feature maps, a transformer to tokenize and encode images, and a UNet-like mechanism to upsample in decoder using previously-extracted feature maps. This model was adapted from and implemented based on the paper published in February 2021 by Jieneng Chen et al., titled TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation.
UNet: the well-known UNet model. This variant of UNet, which is 4-layers deep in the architecture, is adapted and implemented based on the paper published in November 2018 by Ari Silburt et al., titled Lunar Crater Identification via Deep Learning.

SETR	TransUNet	UNet

Figures above are authored in and extracted from the original papers respectively.

It can be observed that the model can perform reasonable semantic segmentation task when inferenced on test image and videos.

Steering Angle Prediction

Detailed description can be found at ./Steering Angle Prediction/README.md.

TruckNN: A CNN model adapted and modified from NVIDIA's 2016 paper End to End Learning for Self-Driving Cars. The original model was augmented with batch normalization layers and dropout layers.
TruckResnet50: A CNN transfer learning model utilizing feature maps extracted by ResNet50, connected to additional fully-connected layers. This model was adapated and modified from Du et al.'s 2019 paper Self-Driving Car Steering Angle Prediction Based on Image Recognition. The first 141 layers of the ResNet50 layers (instead of the first 45 layers as in the original paper) were frozen from updating. Dimensions of the fully-connected layers were also modified.
TruckRNN: A Conv3D-LSTM model, also based on and modified from Du et al.'s 2019 paper mentioned above, was also experimented. The model consumes a sequence of 15 consecutive frames as input, and predicts the steering angle at the last frame. Comparing to the original model, maxpooling layers were omitted and batch normalization layers were introduced. 5 convolutional layers were implemented with the last convolutional layer connected with residual output, followed by two LSTM layers, which is rather different to the model architecture proposed in the paper.

TruckNN	TruckResnet50	TruckRNN

Figures above are authored in and extracted from the original papers respectively.

For further visualization, saliency maps of the last Resnet50 Convolutional layer (layer4) can be observed as below:

The model seems to possess salient features on the road.

About

An experimental project for autonomous vehicle driving perception with steering angle prediction and semantic segmentation using a combination of UNet, attention and transformers.

autonomous-vehicles autonomous-driving udacity-self-driving-car resnet-50 transfer-learning cnn-lstm conv3d transformer attention semantic-segmentation unet-image-segmentation unet setr multihead-attention steering-angle-prediction

Languages

Language:Python 100.0%