sgrvinod / a-PyTorch-Tutorial-to-Object-Detection

SSD: Single Shot MultiBox Detector | a PyTorch Tutorial to Object Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

prior box not clipped as expected

AakiraOtok opened this issue · comments

commented

image

As you stated, prior box will be clipped if it out side of image, but in your code :

prior_boxes.clamp_(0, 1)  # (8732, 4)

prior_boxes still in yolo style (which mean [cx, cy, w, h]), but if we want to clip it, we will want some thing like this :

        prior_boxes = pascalVOC_style(dboxes) #convert to (xmin, ymin, xmax, ymax)
        prior_boxes.clamp_(0, 1)
        prior_boxes = yolo_style(dboxes) #convert to (cx, cy, w, h)

Am I right ?

That's an interesting point, but but I think this is covered. Notice how the center-encoded coordinates are also in percentages:

image

So clamping in between 0 and 1 should work the same.

commented

@AndreiMoraru123
image
w and h just size of prior box, I mean, the top-left point and the right-bottom point of rectangle need to be recalculated from cx, cy, w, h and it can negative or greater than 1, therefore it still can be outside of the image
Let say cx, cy = (0.1, 0.1) but w, h = (0.3, 0.3). You can see that cx, cy, w and h still in valid range but the prior box still outside. I showed prior box of author code in the above image, you can see that happened.

I see your point. You might be onto something. Indeed, if you generate a prior box that is centered close enough to one corner of the image, you could easily overshoot the image with widths and heights that are still within the image bound. Like here, generating prior which is centered at 20% of the (500, 500) image => (100, 100), any (width, height) combo greater than 40% of the image will overshoot and clamp will do nothing about it.

cx, cy, w, h = 0.2, 0.2, 0.5, 0.5

image

Maybe this is worth looking into.

commented

@AndreiMoraru123
Yup, as I stated, we can avoid it by first convert (cx, cy, w, h) to (xmin, ymin, xmax, ymax) and then clamp it into range [0 .. 1]
But I wonder how author can achieve the same performance with paper by using these prior box, or origin caffe repo used the same thing ?

commented

@sgrvinod sorry for bothering you but I think it's a bug, beside that, pretrain link seem to be died

Hi @NicolasHarding, I just looked at it, and I think you are right. Thanks for pointing this out. I was careless and the priors won't be clipped. Therefore, some of my priors are different from those in the paper.

I'm guessing I still get the same performance because there are always enough priors so that even for objects at the edges of images with the "clipped" sizes, there's always a different prior that's close enough and/or the offset calculation is good enough.

I'm not completely sure how to proceed now because immediately modifying my code to clip those priors will affect use of the pretrained model which is trained with the different priors. I'll probably make a note of this issue in the code and tutorial in the next couple of days.

About the pretrained model, I could access it and I know some others could as well. Could you check again? I just changed the permissions of the Google Drive folder above it as well. If it still doesn't work, let me know, I'll make it available elsewhere. @AndreiMoraru123, do you have the same problem?

The link to the torch checkpoint works from Chrome. Google drive is always iffy on other browsers for reasons, @NicolasHarding. Also, @sgrvinod, I know you made these tutorials years ago, but now I'm thinking huggingface would be the best place to upload models and get more exposure for your work.

@AndreiMoraru123 thanks for confirming! I'll check out HuggingFace, this idea never occurred to me, thank you.

commented

@sgrvinod I sincerely apologize for bothering you multiple times, but it seems that this is not correct :
In class MultiBoxLoss :

self.smooth_l1 = nn.L1Loss()

nn.L1Loss() is not actually smooth l1 loss

@sgrvinod I sincerely apologize for bothering you multiple times, but it seems that this is not correct :
In class MultiBoxLoss :

self.smooth_l1 = nn.L1Loss()

nn.L1Loss() is not actually smooth l1 loss

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

commented

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Thanks for fast reply, there's another thing I think you can add in your note, that is, use l1 instead of smooth version can cause exploding gradient (which happened to you) so you must choose batch_size=8. If we want to use batch_size = 32, change to smooth_l1, loss start from around 24 and converge to ~ 2.5 (mean) after 120k iteration.

Also, I'm experiment with variant of implement. If you interested whether not clipped prior box can affect to mAP, I also testing it, so I will update result here (comment section) if you feel no problem about it.

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Thanks for fast reply, there's another thing I think you can add in your note, that is, use l1 instead of smooth version can cause exploding gradient (which happened to you) so you must choose batch_size=8. If we want to use batch_size = 32, change to smooth_l1, loss start from around 24 and converge to ~ 2.5 (mean) after 120k iteration.

Also, I'm experiment with variant of implement. If you interested whether not clipped prior box can affect to mAP, I also testing it, so I will update result here (comment section) if you feel no problem about it.

Sure, thanks.