Wrong behavior of modulated convolutions line 71-72

Question

Wrong behavior of modulated convolutions line 71-72

LozaKiN opened this issue 3 years ago · comments

Hi,
First, thank you for your work!
I tried to train a network on my side and I feel like some modulated convolutions are not working as intended regarding the following code:
temp_fea.append(F.upsample_bilinear(self.DyConv[0](x[feature_names[level + 1]], **conv_args), size=[feature.size(2), feature.size(3)])) (l.71-72 of dyhead.py)
When running this line, the modulated conv receives an input which is four times smaller than the offset and mask (twice shorter on H and W dimensions).
As there is no "assert" on the shape of the inputs, the code runs fine but what is being computed is not really what you expect: the offset and the mask are flattened and only the first quarter of the vector is being used.
This leads to a huge shifting in the computation of the output of the modulated convolution.
To "fix" the issue, I think that the upsample_bilinear() should be applied on x[featurenames[level + 1]] and not the output of the layer.
Hope it helps.