chenyilun95 / tf-cpn

Cascaded Pyramid Network for Multi-Person Pose Estimation (CVPR 2018)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about RefineNet architecture

happywu opened this issue · comments

Hi,

I am wondering why 8x upsampling after three bottlenect is used in RefineNet, isn't 8x sampling too harsh?

Thanks.

We later concatenate all the feature map of different levels together so they should be same size.

@chenyilun95 I mean 8x upsampling could be realized by three 2x sampling,
thus three (bottleneck + 2x upsampling) instead of three bottleneck + 8x sampling,
which seems more smooth to me. Did you try this architecture?

Bottleneck block doesn't downsample the feature.

I guess you misunderstood me.

Let us take out the C5 path alone in your RefineNet, assume C5 in your GlobalNet outputs 8x8 feature maps.

In your RefineNet, you need to upsample these 8x8 feature maps to 64x64 in order to perform your feature map concatenation of different levels.
Your architecture uses three bottleneck first, resulting 8x8 feature maps, then one 8x upsampling to get the 64x64 feature maps.

Also, I could use one bottleneck + 2x upsampling to get 16x16 feature maps first, and perform this (bottleneck + 2x upsampling) another two times to get 64x64 feature maps.

My question here is whether you have compared these two different ways to get 8x upsampled feature maps, or you just only tried your architecture in the first place?

Oh. I got it. We’ve done some similar architecture without previous FPN feature, which works not as good as this. But not your architecture.

Adding FPN feature could lead to good results is reasonable. Thus you didn't do experiments that using the same GlobalNet (FPN), and only differ in how to get high resolution feature maps (how to reach 8x upsampling), right?

Yes. We only do some attempt to use hyper net in the paper.

Got it! Thanks!

I tried it. And it seems that bottleneck + 2x upsampling can not achieves better performance...

@7color94 Thanks for the info!