wjf5203 / SeqFormer

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV 2022 Oral)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why 42 classes?

acaelles97 opened this issue · comments

First of all, congratulations on the nice work!
I wanted to ask why the number of classes is 42 if YT-Vis only has 40 classes. One extra class is used for the background but what about the other one?

I also don't understand why you include the background class if you use focal loss. Original Deformable DeTR focal loss implementation ignores background because it is basically given by the sigmoid probabilities for all the classes being < 0.5.

Thanks a lot for your help!

Hi, thanks for your attention.
Yes, 40 classes is enough for focal loss. Since we keep the background and vanishing object classes for experimentation, so here is 42.
Actually this can be changed to 40 completely, sorry for the misleading.

Thanks for your answer
Does the performance stay the same without these 2 extra classes?

Yes, this has no impact on performance.