About the top-down network

Question

About the top-down network

hongsukchoi opened this issue 3 years ago · comments

Hongsuk Benjamin Choi commented 3 years ago

Hi,

I found your paper very interesting. I just can't wait until the code is released, so ask here.

The paper says that the TD estimates all joints for all person in a bounding box, but GCN&TCN seems to produce one person per one bounding box. Then how do you group or select joints for a person in a bounding box before feeding the joint heatmap to GCN&TCN? Or do you put all joint heatmaps to GCN&TCN? (I don't think it's possible)

Also, BU use the concatenation of joint heatmaps and the input frame as input. But how? Is the channel of input is 3(rgb)+1(heatmap)? There are several potential problems. The number of people in the input frame change which may lead to dynamic input channel, or overlapping joint heatmaps of the same person. Could you give more details about them?

Thank you!

Cheng Yu · Answer 1 · Sun May 02 2021 11:38:48 GMT+0800 (China Standard Time)

We first group the instance by ID tag, and then use NMS to select valid heatmaps of each instance.
All heatmaps are mapped back to the size of the original image, so there will only be 3 RGB + 17 keypoint heatmaps

Hongsuk Benjamin Choi · Answer 2 · Thu May 13 2021 12:42:28 GMT+0800 (China Standard Time)

Thanks for the clarification! Hope to see the codes soon:)