fyu / dilation

Dilated Convolution for Semantic Image Segmentation

Home Page:https://www.vis.xyz/pub/dilation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dealing with unlabeled pixels in KITTI dataset

selcuksandikci opened this issue · comments

Hello Fisher,

First of all great work and thank you for making your work publicly available. I have read your ICLR paper and have been trying to reproduce your results for the KITTI dataset. So far I have been able to train Front-end module with the recommended training parameters in your paper. I also trained the context module for KITTI dataset.

I have used the dataset from the following link:

http://adas.cvc.uab.es/s2uad/?page_id=11

This dataset provides RGB images and corresponding ground-truth label images. There are 11 semantic classes each represented by a specific color. What I noticed is there are some image pixels whose color is black (i.e. RGB = [0,0,0]) and I suppose these are "void" regions and don't belong to any of the 11 classes. I treated these void regions as a new class, making total number of classes 12 during training.

After training is complete, I compared segmentation results of my trained models to your pretrained Kitti Dilation-7 model on the validation set. Indeed they produce similar results except the "void" regions. Your pretrained model doesn't produce any "void" class, whereas mine does. Obviously you don't have this "void" class in your experiments, and you handled it somehow.

How did you handle these "void" regions in the KITTI dataset? More generally, is it possible to ignore specific classes during training in your algorithm? If so, could you please describe how to achieve that?

Thanks in advance,

Selcuk

Hey @selcuksandikci ,

Can you please share the steps you took to train it? Or share the train prototxt?

I am trying to follow the steps from the tutorial but my network seem not to be learning anything.

Hello @atanas1054 ,

The training data needs to be converted to so-called "label images". Pixels in a label image indicate the class label of the original pixel in the RGB image. Label image pixels have values in the range [0, N] where N is the number of classes you are interested in. Hopefully it is clear for you.

After the conversion, just follow what Fisher describes in his paper. It should work out of the box.

To summarize, you have a label issue. You need to fix that and everything will work out fine.

@selcuksandikci Thank you for your reply. That was exactly what I was looking for.

Have you managed to solve the problem with the "void" regions? If yes, can you please share how?

I am currently predicting void regions as well, but my IoU numbers are lower than the ones in the paper because of that.

@atanas1054 i would say just ignore void regions. Set them to 255 in label images, specify the ignore label to be 255 in the SoftMaxWithLossLayer during training.