Semantic Segmentation on the Cityscapes Dataset

For a detailed overview of the dataset visit the Cityscapes website and the Cityscapes Github repository

This repository focuses solely on the Pixel-Level Semantic Labeling Task of the cityscapes dataset.

Script usage

E.g. : Train a DeepLabV3plus model named MyDeepLabV3plus with EfficientNetV2B0 backbone, Dice Loss as a loss function, using batch size equal to 1, the relu activation function and dropout rate of 0.1 for the Dropout layers, for 60 epochs.

Train the model

> python3 train_model.py --data_path /path/to/dataset --model_type DeepLabV3plus --model_name MyDeepLabV3plus --backbone EfficientNetV2B0 --loss DiceLoss --batch_size 1 --activation relu --dropout 0.1 --epochs 60

Evaluate the model on the validation set.
- Evaluate the MeanIoU
- Evaluate the IoU of every class seperatly
- Generate the confusion matrix for validation set
```
> python3 evaluate_model.py --data_path /path/to/dataset --model_type DeepLabV3plus --model_name MyDeepLabV3plus --backbone EfficientNetV2B0
```
Create predictions for validation and test set

Perform inference on the validation set and save the predicted images
```
> python3 create_predictions.py --data_path /path/to/dataset --model_type DeepLabV3plus --model_name MyDeepLabV3plus --backbone EfficientNetV2 --split "val"
```
Perform inference on the test set and save the predicted images
```
> python3 create_predictions.py --data_path /path/to/dataset --model_type DeepLabV3plus --model_name MyDeepLabV3plus --backbone EfficientNetV2 --split "test"
```
Predictions are saved under the predictions/model-type/model-name/split directory. For the example above the following 4 directories are created:
- predictions/DeepLabV3plus/MyDeepLabV3plus/val/rgb
- predictions/DeepLabV3plus/MyDeepLabV3plus/val/grayscale
- predictions/DeepLabV3plus/MyDeepLabV3plus/test/rgb
- predictions/DeepLabV3plus/MyDeepLabV3plus/test/grayscale

The RGB Images look like the following:

The `run.sh` performs model training and evaluation by default, and can optionally make predictions on the test set.

This script invokes the python scirpts and also adds all the logs and predictions of the given model to a zip archive when the predict flag is set.

> ./run.sh -d /path/to/dataset -t DeepLabV3plus -n MyDeepLabV3plus -b EfficientNetV2

Dataset Utilities

1. File parsing and decoding

Parse files which are under the following directory sctructure

<data_path> : the root directory of the Cityscapes dataset
|
├── gtFine_trainvaltest
│   └── gtFine
│       ├── test
│       │   ├── berlin
│       │   ├── bielefeld
│       │   ├── bonn
│       │   ├── leverkusen
│       │   ├── mainz
│       │   └── munich
│       ├── train
│       │   ├── aachen
│       │   ├── bochum
│       │   ├── bremen
│       │   ├── cologne
│       │   ├── darmstadt
│       │   ├── dusseldorf
│       │   ├── erfurt
│       │   ├── hamburg
│       │   ├── hanover
│       │   ├── jena
│       │   ├── krefeld
│       │   ├── monchengladbach
│       │   ├── strasbourg
│       │   ├── stuttgart
│       │   ├── tubingen
│       │   ├── ulm
│       │   ├── weimar
│       │   └── zurich
│       └── val
│           ├── frankfurt
│           ├── lindau
│           └── munster
└── leftImg8bit_trainvaltest
    └── leftImg8bit
        ├── test
        │   ├── berlin
        │   ├── bielefeld
        │   ├── bonn
        │   ├── leverkusen
        │   ├── mainz
        │   └── munich
        ├── train
        │   ├── aachen
        │   ├── bochum
        │   ├── bremen
        │   ├── cologne
        │   ├── darmstadt
        │   ├── dusseldorf
        │   ├── erfurt
        │   ├── hamburg
        │   ├── hanover
        │   ├── jena
        │   ├── krefeld
        │   ├── monchengladbach
        │   ├── strasbourg
        │   ├── stuttgart
        │   ├── tubingen
        │   ├── ulm
        │   ├── weimar
        │   └── zurich
        └── val
            ├── frankfurt
            ├── lindau
            └── munster

Each of the train,val,test directories contain subdirectories with the name of a city. To use a whole split, subfolder='all' must be passed to the Dataset.create() method in order to read the images from all the subfolders. For testing purposes a smaller number of images from the dataset can be used by passing *subfolder='<CityName>'*. For example, passing split='train' to the Dataset() constructor, and subfolder='aachen' to the create() method will make the Dataset object only read the 174 images in the folder aachen and convert them into a tf.data.Dataset. You can choose either all the subfolders or one of them, but not an arbitrary combination of them. After the images (x) and the ground truth images (y) are read and decoded, they are combined into a single object (x, y).

2. Preprocessing :

Generally images have a shape of (batch_size, height, width, channels)

Split the image into smaller patches with spatial resolution (256, 256). Because very image has a spatial resolution of (1024, 2048) 32 patches are produced and they comprise a single batch. This means that when the patching technique is used the batch size is fixed to 32. After this operation the images have a shape of (32, 256, 256, 3) while the the ground truth images have a shape of (32, 256, 256, 1). To enable patching set the use_patches arguement of the create() method, to True.

Perform data Augmentation
- Randomly perform horrizontal flipping of images
- Randomly adjust brightness
- Randomly adjust contrast
- Apply gaussian blur with random kernel size and variance
NOTE : while all augmentations are performed on the images, only horrizontal flip is performed on the ground truth images, because changing the pixel values of the ground truth images means changing the class they belong to.

Normalize images :
- The input pixels values are scaled between -1 and 1 as default
- If using a pretrained backbone normalize according to what the pretrained network expects at its input. To determine what type of preprocessing will be done to the images, the name of the pretrained network must be passed as the preprocessing arguement of the Dataset constructor. For example, if a model from the EfficientNet model family (i.e EfficientNetB0, EfficientNetB1, etc) is used as a backbone, then preprocessing = "EfficientNet" must be passed.

Preprocess ground truth images:
- Map eval ids to train ids
- Convert to one-hot encoding
- After this operation ground truth images have a shape of (batch_size, 1024, 2048, num_classes)

Finally the dataset which is created is comprised of elements (image, ground_truth) with shape ((batch_size, height, width, 3), (batch_size, height, width, num_classes))

Segmentation Models

Models	Reference
`U-net`	U-Net: Convolutional Networks for Biomedical Image Segmentation
`Residual U-net`	-
`Attention U-net`	Attention U-Net: Learning Where to Look for the Pancreas , CBAM: Convolutional Block Attention Module
`U-net++`	UNet++: A Nested U-Net Architecture for Medical Image Segmentation
`DeepLabV3+`	Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Using an ImageNet pretrained backbone is supported only for U-net, Residual U-net and DeepLabV3+.

Supported Network families as backbone choices:

Network Family	Reference
`ResNet`	Deep Residual Learning for Image Recognition
`ResNetV2`	Identity Mappings in Deep Residual Networks
`EfficientNet`	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
`EfficientNetV2`	EfficientNetV2: Smaller Models and Faster Training
`MobileNet`	MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
`MobileNetV2`	MobileNetV2: Inverted Residuals and Linear Bottlenecks
`MobileNetV3`	Searching for MobileNetV3
`RegNetX & RegNetY`	Designing Network Design Spaces

Segmentation Losses

Loss	Description	Reference
`IoU Loss`	Loss based on the IoU (Intersecion over Union) metric	Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
`Dice Loss`	Loss based on the dice or F1 score	Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations
`Tversky Loss`	Generalized loss function based on the Tversky index to address the issue of data imbalance	Tversky loss function for image segmentation using 3D fully convolutional deep networks
`Focal Tversky`	Generalized focal loss function based on the Tversky index	A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation
`Hybrid Loss`	Combines a Region based Loss (Dice) with a Distribution based Loss (Crossentropy)	Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation
`Focal Hybrid Loss`	Focal variation of Hybrid Loss	Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation

KostasStefanidis / Semantic-Segmentation

Semantic Segmentation on the Cityscapes Dataset

Script usage

The `run.sh` performs model training and evaluation by default, and can optionally make predictions on the test set.

Dataset Utilities

1. File parsing and decoding

2. Preprocessing :

Segmentation Models

Supported Network families as backbone choices:

Segmentation Losses

About

Languages

Semantic Segmentation on the Cityscapes Dataset

Script usage

The run.sh performs model training and evaluation by default, and can optionally make predictions on the test set.

Dataset Utilities

1. File parsing and decoding

2. Preprocessing :

Segmentation Models

Supported Network families as backbone choices:

Segmentation Losses

About

Languages

The `run.sh` performs model training and evaluation by default, and can optionally make predictions on the test set.