Bug in computation of h_mask_size_

Question

Bug in computation of h_mask_size_

ClementLeBihan opened this issue 6 months ago · comments

ClementLeBihan commented 6 months ago

Hi,

There is a bug in the code in computing h_mask_size_.

As a reminder, we first compute det_num as follow :

det_num_ = param_.feature_size.x * param_.feature_size.y * param_.num_anchors; # [216 x 248 x 6 = 321408]

Then h_mask_size as follow (with #define DIVUP(x, y) (x + y - 1) / y) :

h_mask_size_ = det_num_ * DIVUP(det_num_, NMS_THREADS_PER_BLOCK) * sizeof(uint64_t);
So this line is replaced by :
h_mask_size_ = det_num_ * (det_num_ + NMS_THREADS_PER_BLOCK - 1) / NMS_THREADS_PER_BLOCK * sizeof(uint64_t);
The first product is done first, and lead to 321408 * (321408 + 64 - 1) = 103323351168 that is waaaaay too large to fit in a unsigned int.
It lead to a big mistake and the final h_mask_size_ is 30517008 (30Mo) instead of the real value that should be 12915418896 (12Go) !! If I had parenthesis around DIVUP to got the real value, then checkRuntime(cudaMemsetAsync(h_mask_, 0, h_mask_size_, _stream)); take tooooo much time ...

Are you sure about this h_mask_size_ computation ? I'm not an expert of nms so I can't fix it myself :/