prior box not clipped as expected

Question

prior box not clipped as expected

AakiraOtok opened this issue a year ago · comments

As you stated, prior box will be clipped if it out side of image, but in your code :

prior_boxes.clamp_(0, 1)  # (8732, 4)

prior_boxes still in yolo style (which mean [cx, cy, w, h]), but if we want to clip it, we will want some thing like this :

        prior_boxes = pascalVOC_style(dboxes) #convert to (xmin, ymin, xmax, ymax)
        prior_boxes.clamp_(0, 1)
        prior_boxes = yolo_style(dboxes) #convert to (cx, cy, w, h)

Am I right ?

Andrei Moraru · Answer 1 · Sat Jun 10 2023 16:52:27 GMT+0800 (China Standard Time)

That's an interesting point, but but I think this is covered. Notice how the center-encoded coordinates are also in percentages:

So clamping in between 0 and 1 should work the same.

Aakira · Answer 2 · Sat Jun 10 2023 17:00:31 GMT+0800 (China Standard Time)

@AndreiMoraru123

w and h just size of prior box, I mean, the top-left point and the right-bottom point of rectangle need to be recalculated from cx, cy, w, h and it can negative or greater than 1, therefore it still can be outside of the image
Let say cx, cy = (0.1, 0.1) but w, h = (0.3, 0.3). You can see that cx, cy, w and h still in valid range but the prior box still outside. I showed prior box of author code in the above image, you can see that happened.

Andrei Moraru · Answer 3 · Sat Jun 10 2023 17:25:34 GMT+0800 (China Standard Time)

I see your point. You might be onto something. Indeed, if you generate a prior box that is centered close enough to one corner of the image, you could easily overshoot the image with widths and heights that are still within the image bound. Like here, generating prior which is centered at 20% of the (500, 500) image => (100, 100), any (width, height) combo greater than 40% of the image will overshoot and clamp will do nothing about it.

cx, cy, w, h = 0.2, 0.2, 0.5, 0.5

Maybe this is worth looking into.

Aakira · Answer 4 · Sat Jun 10 2023 17:43:11 GMT+0800 (China Standard Time)

@AndreiMoraru123
Yup, as I stated, we can avoid it by first convert (cx, cy, w, h) to (xmin, ymin, xmax, ymax) and then clamp it into range [0 .. 1]
But I wonder how author can achieve the same performance with paper by using these prior box, or origin caffe repo used the same thing ?

Aakira · Answer 5 · Tue Jun 13 2023 18:18:23 GMT+0800 (China Standard Time)

@sgrvinod sorry for bothering you but I think it's a bug, beside that, pretrain link seem to be died

Sagar Vinodababu · Answer 6 · Tue Jun 13 2023 21:02:02 GMT+0800 (China Standard Time)

Hi @NicolasHarding, I just looked at it, and I think you are right. Thanks for pointing this out. I was careless and the priors won't be clipped. Therefore, some of my priors are different from those in the paper.

I'm guessing I still get the same performance because there are always enough priors so that even for objects at the edges of images with the "clipped" sizes, there's always a different prior that's close enough and/or the offset calculation is good enough.

I'm not completely sure how to proceed now because immediately modifying my code to clip those priors will affect use of the pretrained model which is trained with the different priors. I'll probably make a note of this issue in the code and tutorial in the next couple of days.

About the pretrained model, I could access it and I know some others could as well. Could you check again? I just changed the permissions of the Google Drive folder above it as well. If it still doesn't work, let me know, I'll make it available elsewhere. @AndreiMoraru123, do you have the same problem?

Andrei Moraru · Answer 7 · Tue Jun 13 2023 21:10:09 GMT+0800 (China Standard Time)

The link to the torch checkpoint works from Chrome. Google drive is always iffy on other browsers for reasons, @NicolasHarding. Also, @sgrvinod, I know you made these tutorials years ago, but now I'm thinking huggingface would be the best place to upload models and get more exposure for your work.

Sagar Vinodababu · Answer 8 · Tue Jun 13 2023 21:42:02 GMT+0800 (China Standard Time)

@AndreiMoraru123 thanks for confirming! I'll check out HuggingFace, this idea never occurred to me, thank you.

Aakira · Answer 9 · Wed Jun 14 2023 06:28:37 GMT+0800 (China Standard Time)

@sgrvinod I sincerely apologize for bothering you multiple times, but it seems that this is not correct :
In class MultiBoxLoss :

self.smooth_l1 = nn.L1Loss()

nn.L1Loss() is not actually smooth l1 loss

Sagar Vinodababu · Answer 10 · Wed Jun 14 2023 06:39:12 GMT+0800 (China Standard Time)

@sgrvinod I sincerely apologize for bothering you multiple times, but it seems that this is not correct :
In class MultiBoxLoss :
self.smooth_l1 = nn.L1Loss()
nn.L1Loss() is not actually smooth l1 loss

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Aakira · Answer 11 · Wed Jun 14 2023 07:51:40 GMT+0800 (China Standard Time)

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Thanks for fast reply, there's another thing I think you can add in your note, that is, use l1 instead of smooth version can cause exploding gradient (which happened to you) so you must choose batch_size=8. If we want to use batch_size = 32, change to smooth_l1, loss start from around 24 and converge to ~ 2.5 (mean) after 120k iteration.

Also, I'm experiment with variant of implement. If you interested whether not clipped prior box can affect to mAP, I also testing it, so I will update result here (comment section) if you feel no problem about it.

Sagar Vinodababu · Answer 12 · Wed Jun 14 2023 10:42:06 GMT+0800 (China Standard Time)

Hi @AakiraOtok, it's not a bother at all. Yeah, I was aware of this. Thanks for reminding me. See #60. I'll make a note of this difference from the paper in the tutorial and code, but probably won't change the loss function now as L1 loss appears to work just as well.

Thanks for fast reply, there's another thing I think you can add in your note, that is, use l1 instead of smooth version can cause exploding gradient (which happened to you) so you must choose batch_size=8. If we want to use batch_size = 32, change to smooth_l1, loss start from around 24 and converge to ~ 2.5 (mean) after 120k iteration.

Also, I'm experiment with variant of implement. If you interested whether not clipped prior box can affect to mAP, I also testing it, so I will update result here (comment section) if you feel no problem about it.

Sure, thanks.