Why does the Downsample_PASA_group_softmax use softmax over dim 1?

Question

Why does the Downsample_PASA_group_softmax use softmax over dim 1?

Windaway opened this issue 4 years ago · comments

I think this implement will cause the downsample ops to choose at most one channel information from one group. Or it can be multiplied by channels per group?

Xueyan Zou · Answer 1 · Tue Oct 13 2020 01:48:44 GMT+0800 (China Standard Time)

It can be multiplied by channels per group. The dimension size of axis 1 is group*kernel_size*kernel_size.