Semantic Segmentation

Introduction

This project targets at labeling the pixels of an images(also knownas as Semantic Segmantation) using a Fully Convolutional Network (FCN). It implements transfer learning using VGG-16 and extract output of layer7, then used 1x1 convolution following by several transposed convolutional layer combine with skip-connection to upsample. The network architecture is like below:

Setup

GPU

Please make sure youTensorflow GPU is enabled. If you don't have a GPU on your system, you can use AWS or another cloud computing platform. This project uses Floyd Hub.

Frameworks and Packages

Make sure you have the following is installed:

Dataset

Download the Kitti Semantics Pixel-level data from here. Extract the dataset in the data folder. This will create the folder data_semantics with all the training a test images.

Label Format

name id category 'unlabeled' , 0 , 'void'
'ego vehicle' , 1 , 'void'
'rectification border' , 2 , 'void'
'out of roi' , 3 , 'void'
'static' , 4 , 'void'
'dynamic' , 5 , 'void'
'ground' , 6 , 'void'
'road' , 7 , 'ground'
'sidewalk' , 8 , 'ground'
'parking' , 9 , 'ground'
'rail track' , 10 , 'ground'
'building' , 11 , 'construction'
'wall' , 12 , 'construction'
'fence' , 13 , 'construction'
'guard rail' , 14 , 'construction'
'bridge' , 15 , 'construction'
'tunnel' , 16 , 'construction'
'pole' , 17 , 'object'
'polegroup' , 18 , 'object'
'traffic light' , 19 , 'object'
'traffic sign' , 20 , 'object'
'vegetation' , 21 , 'nature'
'terrain' , 22 , 'nature'
'sky' , 23 , 'sky'
'person' , 24 , 'human'
'rider' , 25 , 'human'
'car' , 26 , 'vehicle'
'truck' , 27 , 'vehicle'
'bus' , 28 , 'vehicle'
'caravan' , 29 , 'vehicle'
'trailer' , 30 , 'vehicle'
'train' , 31 , 'vehicle'
'motorcycle' , 32 , 'vehicle'
'bicycle' , 33 , 'vehicle'
'license plate' , -1 , 'vehicle'

Data Augmentation

To train a more robust model, here I implemented data augmentation, which includes image scaling with different scales, flipping, adding salt and pepper noise and darkening. These implementations enlarged data size to 8 times than its original size, which helps alot. Below are some sample images: