stanford-drone-semantic-segmentation

Solving semantic segmentation task for Stanford Drone Dataset

Solution Brief:

To train the semantic segmentation mask prediction, bounding boxes are used as are.
Loss is cross-entropy.
Metric is mean IoU.
Transfer learning for encoder in U-Net, freeze encoder and train only decoder.
Augmentation (random flip left-right and random shift).

For some frame, the true mask looks like the figure below.

The figure below shows the predicted mask for this frame using MobileNet U-Net model before training.

And after training on the small data, the predicted mask looks like

hyperparameter optimization
loss function from TernausNet
bounding box as a bad semantic segmentation, improving it using self-training [1, 2, 3, 4]
improving semantic segmentation using video prediction
manually create true semantic segmentation masks for a part of data and train a neural network, using semi-supervised learning or semi-supervised learning + GAN

Other

Language:Python 70.1%Language:Makefile 29.9%