HRNet / HRNet-Bottom-Up-Pose-Estimation

This is an official pytorch implementation of “Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates” (https://arxiv.org/abs/2006.15480).

Home Page:https://github.com/HRNet/HRNet-Bottom-Up-Pose-Estimation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What if the training pairs are without bbox?

Frank-Dz opened this issue · comments

            area[i, 0] = obj['bbox'][2]*obj['bbox'][3]

This method requires bbox during training. What if the training pairs are without bbox?

How can we calculate the area and loss of offset?

Hi, we use the area of one person's bounding box to normalize this person's offset during calculating the loss of the offset map.
Alternatively, you can use the maximum and minimum coordinates of the key points to estimate the size of the people. The selection of this normalization tool doesn't influence the performance.

By the way, we use the average of all keypoints belong to the same person as the center of this person. In the code, we can choose the center of the person's bounding box as the center of this person, which may cause confusion. The APs of the two choices are the same.

Hi, we use the area of one person's bounding box to normalize this person's offset during calculating the loss of the offset map.
Alternatively, you can use the maximum and minimum coordinates of the key points to estimate the size of the people. The selection of this normalization tool doesn't influence the performance.

By the way, we use the average of all keypoints belong to the same person as the center of this person. In the code, we can choose the center of the person's bounding box as the center of this person, which may cause confusion. The APs of the two choices are the same.

For example, I just want to do head detection:
image

My concern is that this will change the way we calculate the loss.

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.

Thank you very much!

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.

Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

image

Do you mean we should just remove the offset loss?
Just use heatmap loss?

or "I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently." --> you mean we still need to use both heatmap and offset loss during detection?

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:

image

Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:
image
Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

Yes! That's the point! Because in my case, I only want to detect the head. And there are many images without the full body. So the center position should be the center of BBOX of head. Yet this center position is quite close to the key point of head.
So maybe removing them is a good choice.
Thank you again!

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:
image
Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

The normalization tool should be designed for your project. You can use any length that reflects the size of the person, or you can directly regress the offsets without this normalization.

OK. I am trying the BBox generation based on the projection. But the BB is for the head rather than the whole body.
Thank you very much!

For your case, I think the heatmap is more suitable because you just need to detect the head keypoints and the heatmap can detect all the same kind of the keypoints precisely and efficiently.

Sorry I just got confused about this:
image
Do you mean we should just remove the offset loss?
Just use heatmap loss?

You can totally remove the offset regression part both in the training and post-processing.

In this method, the offset means the location offset between the keypoint location and the center point location. However, in your case, I do not know how to choose the center point. I am curious about how do you use the offset regression.

Hi~ Still get a little confused. Really sorry to bother you.
Seems like the method still need nonmaximum suppression to filter the Redundant BBox.
I have train the NN to predict only one point in the heatmap. How can we avoid the nonmaximum suppression which get BBox and offset involved? (In other words, we do not need to group the people, since there is only one point for each person.) Any suggestions?
image

Actually, https://openaccess.thecvf.com/content_CVPR_2019/html/Ribera_Locating_Objects_Without_Bounding_Boxes_CVPR_2019_paper.html
There is a paper quite close to my situation where they do obj detection without Bbox.
But I hope to use the heatmap to do so since I think this will provide a more stable result.
So would like to ask for your suggestions.
Thank you very much!