What if the training pairs are without bbox?

Question

What if the training pairs are without bbox?

Frank-Dz opened this issue 3 years ago · comments

            area[i, 0] = obj['bbox'][2]*obj['bbox'][3]

This method requires bbox during training. What if the training pairs are without bbox?

How can we calculate the area and loss of offset?

Gengzigang · Answer 1 · Fri Mar 05 2021 21:32:57 GMT+0800 (China Standard Time)

Hi, we use the area of one person's bounding box to normalize this person's offset during calculating the loss of the offset map.
Alternatively, you can use the maximum and minimum coordinates of the key points to estimate the size of the people. The selection of this normalization tool doesn't influence the performance.

By the way, we use the average of all keypoints belong to the same person as the center of this person. In the code, we can choose the center of the person's bounding box as the center of this person, which may cause confusion. The APs of the two choices are the same.

Frank-D · Answer 2 · Fri Mar 05 2021 21:36:12 GMT+0800 (China Standard Time)

yes. But what if there are only one joint rather than 17 joints like coco? How should we set the code? Thanks!

…

On Fri, 5 Mar 2021 at 9:33 PM, Gengzigang ***@***.***> wrote: Hi, we use the area of one person's bounding box to normalize this person's offset during calculating the loss of the offset map. You can use the maximum and minimum coordinates of the key points to estimate the size of the people. The selection of this normalization tool doesn't influence the performance. By the way, we use the average of all keypoints belong to the same person as the center of this person. In the code, we can choose the center of the person's bounding box as the center of this person, which may cause confusion. The APs of the two choices are the same. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#11 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANDWCHIIF2JBYRUGTEK2YOLTCDMRTANCNFSM4YU73IMA> .

Frank-D · Answer 3 · Fri Mar 05 2021 21:39:15 GMT+0800 (China Standard Time)

Hi, we use the area of one person's bounding box to normalize this person's offset during calculating the loss of the offset map.
Alternatively, you can use the maximum and minimum coordinates of the key points to estimate the size of the people. The selection of this normalization tool doesn't influence the performance.

By the way, we use the average of all keypoints belong to the same person as the center of this person. In the code, we can choose the center of the person's bounding box as the center of this person, which may cause confusion. The APs of the two choices are the same.

For example, I just want to do head detection:

My concern is that this will change the way we calculate the loss.

Gengzigang · Answer 4 · Fri Mar 05 2021 21:45:59 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

Frank-D · Answer 5 · Fri Mar 05 2021 21:47:45 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.

Thank you very much!

Gengzigang · Answer 6 · Fri Mar 05 2021 22:05:10 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.

Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Frank-D · Answer 7 · Fri Mar 05 2021 22:16:59 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

Do you mean we should just remove the offset loss?
Just use heatmap loss?

or "I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently." --> you mean we still need to use both heatmap and offset loss during detection?

Gengzigang · Answer 8 · Fri Mar 05 2021 22:42:40 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

Frank-D · Answer 9 · Fri Mar 05 2021 22:45:41 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

Yes! That's the point! Because in my case, I only want to detect the head. And there are many images without the full body. So the center position should be the center of BBOX of head. Yet this center position is quite close to the key point of head.
So maybe removing them is a good choice.
Thank you again!

Frank-D · Answer 10 · Fri Mar 05 2021 23:33:08 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

Frank-D · Answer 11 · Sat Mar 13 2021 09:23:27 GMT+0800 (China Standard Time)

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

Hi~ Still get a little confused. Really sorry to bother you.
Seems like the method still need nonmaximum suppression to filter the Redundant BBox.
I have train the NN to predict only one point in the heatmap. How can we avoid the nonmaximum suppression which get BBox and offset involved? (In other words, we do not need to group the people, since there is only one point for each person.) Any suggestions?

Frank-D · Answer 12 · Sat Mar 13 2021 09:37:38 GMT+0800 (China Standard Time)

Actually, https://openaccess.thecvf.com/content_CVPR_2019/html/Ribera_Locating_Objects_Without_Bounding_Boxes_CVPR_2019_paper.html
There is a paper quite close to my situation where they do obj detection without Bbox.
But I hope to use the heatmap to do so since I think this will provide a more stable result.
So would like to ask for your suggestions.
Thank you very much!