Solving semantic segmentation task for Stanford Drone Dataset
-
To train the semantic segmentation mask prediction, bounding boxes are used as are.
-
Loss is cross-entropy.
-
Metric is mean IoU.
-
Transfer learning for encoder in U-Net, freeze encoder and train only decoder.
-
Augmentation (random flip left-right and random shift).
For some frame, the true mask looks like the figure below.
The figure below shows the predicted mask for this frame using MobileNet U-Net model before training.
And after training on the small data, the predicted mask looks like
-
hyperparameter optimization
-
loss function from TernausNet
-
bounding box as a bad semantic segmentation, improving it using self-training [1, 2, 3, 4]
-
improving semantic segmentation using video prediction
-
manually create true semantic segmentation masks for a part of data and train a neural network, using semi-supervised learning or semi-supervised learning + GAN