Downsampling rate
benemer opened this issue · comments
Hi!
From your ResBlock
class, I can see that you use a constant downsampling rate of 2 by using the nn.AvgPool2d
layer with kernel_size=3
, stride=2
and padding=1
.
However, in your arXiv paper, the first residual block downsamples the width from 2048 to 512 which indicates a downsampling rate of 4. Also, I don't understand how the last layer upsamples the feature map from 1024x64x32 to 2048x64x32 since in your code, a Conv2d layer with kernel_size=(1,1)
is used here.
Is this a mistake in the visualization of the architecture?
Thank you!
Hello!!
Yes... Strangely I haven't pick up on that visualization error. We tried to have an extra layer on both sides and I must have forgotten to correctly update the figure.
It should go like
2048,64
1024, 32
512, 16
256, 8
128, 4
The final conv-1x1 shouldn't change the dimensionality either as you pointed out.
Thanks for pointing out!
Thanks a lot for the fast reply and clarification!
Since you use 4 pooling layers, I assume you mean:
2048,64
1024, 32
512, 16
256, 8
128, 4
Thanks again!
And I did it again ahah!!
Exactly that!