about heatmap size

Question

about heatmap size

akziq opened this issue 6 years ago · comments

Hi @chenyilun95,great work!
how about generate heatmap size the same as original image （img :256x192 ， heatmap: 256x192）? will it increase AP due to pixel to pixel match?
Thanks.

Yilun Chen · Answer 1 · Sat Apr 28 2018 11:07:28 GMT+0800 (China Standard Time)

First question is how to upsample to generate the final output heatmap?

Bilinear upsampling will give more accurate gradient back-propagation for each pixel. But in testing, directly upsampling cannot produce the heatmap of higher resolution, which probably reduce the gain. Similar experiment is done in https://github.com/chenyilun95/tf-cpn/issues/4, which may show it doesn't work with better gradient in high resolution.
Skip-connection with the lower feature maps, but their semantics aren't clear probably.
Deconv: recent work (Simple Baseline for Human Pose Estimation) says it's fine with deconvolution layer. But they still upsample the output to 64x48. If that works, it might works as well in higher resolution output.

Nevertheless, that's only my viewpoints. Experiment results says louder !

akziq · Answer 2 · Sun Apr 29 2018 13:14:29 GMT+0800 (China Standard Time)

I apologize for my ambiguous expression.
my question is that the NET 's last layer output is 64x48,which is(W/4,H/4).
how about change the last layer output to 256x192,which is (W,H).
so that orig-img (W,H)->(W/2,H/2)->(W/4,H/4)->.....->(W/4,H/4)->(W/2,H/2)->(W,H),(pre-heatmap)

pixel to pixel match between orig-img and pre-heatmap will increase AP ?

Thanks for your response .

Yilun Chen · Answer 3 · Sun Apr 29 2018 19:00:20 GMT+0800 (China Standard Time)

Excuse me... I'm now confused ... how do you change the last layer output to 256x192 ?

akziq · Answer 4 · Mon Apr 30 2018 15:32:28 GMT+0800 (China Standard Time)

for exmaple
1,add some intermediate layer(W/2,H/2) by （Bilinear upsampling / Deconv/Skip-connection ）
2,and（Bilinear upsample / Deconv/Skip-connect） it to(W,H).

Yilun Chen · Answer 5 · Tue May 01 2018 12:34:05 GMT+0800 (China Standard Time)

emmmm... then I think the above comments are my response... Generally, I tend to think it won't work considering efficiency and effectiveness.

akziq · Answer 6 · Tue May 01 2018 20:17:12 GMT+0800 (China Standard Time)

@chenyilun95,Thank you,I get it.
I note that most people make the last layer output to 64* 64 （Hourglass Net etc.）， 64*48(yours).
so the best practice of last layer output is (W/4,H/4)?
Thanks for your response ,I will close this issue.