Why does the Downsample_PASA_group_softmax use softmax over dim 1?
Windaway opened this issue · comments
Windaway commented
I think this implement will cause the downsample ops to choose at most one channel information from one group. Or it can be multiplied by channels per group?
Xueyan Zou commented
It can be multiplied by channels per group. The dimension size of axis 1 is group*kernel_size*kernel_size.