Comparison with zero-padding version.

Question

Comparison with zero-padding version.

weigq opened this issue 2 years ago · comments

Excellent work!
BTW, the proposed edge/corner neighborghood selection has stronger performance than the zero padding version is claimed in the paper, i wonder about the performance of the latter one, which is not mentioned in the paper?

weigq commented 2 years ago

Thanks

Xiaoyu Shi · Answer 1 · Sun Apr 17 2022 17:19:45 GMT+0800 (China Standard Time)

I am also interested in this claim, but did not find ablation study on that.

Ali Hassani · Answer 2 · Mon Apr 18 2022 05:19:04 GMT+0800 (China Standard Time)

Hello, and thank you for your interest.

Generally we observed on-par/worse performance when using zero padding, and the gap increased as we scaled up, or moved towards downstream tasks.
I should also note that with zero padding, the module would no longer be as expressive as Swin's SWA, because of the reduced receptive field size. Additionally, with zero padding, the attention mechanism would not end up being equivalent to self-attention when the neighborhood size matches window size.
In other words, zero padding is just less expressive, and at best only saves a limited amount of compute, even with the CUDA kernel, which would be unnoticeable.

We may add our findings regarding the zero padding version in our supplementary materials in future releases.

I hope this helps.