Issues about the patch embedding.

Question

Issues about the patch embedding.

ddghjikle opened this issue 3 years ago · comments

Hi, thanks very much for sharing this wonderful work. According to the definition of PatchEmbed(nn.Module). It seems that the parameters such as patch_size and img_size are not used. It seems that performance improvements of SwinIRs are provided by these MAS and MLP layers. Of course, multiple skip connections in the RSTB and STL are also helpful. I am curious about why the SwinIRs do not form patches with multiple pixels. For example, the PatchEmbed method used in
《Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions》.

Jingyun Liang · Answer 1 · Thu Sep 02 2021 23:55:28 GMT+0800 (China Standard Time)

Your idea deserves a try. To be honest, I didn't try it because I believe image restoration is a very local problem. We should process the image pixel by pixel (attention between pixels). From my intuition, there should have a performance drop if we process them patch by patch (attention between patches such as what IPT does). Imagine that the patch embedding process (e.g., using a strided convolution layer) is not invertible. The spatial information within a patch is lost, which may have extreme burden for the model to reconstruct pixels (pixel is our final goal) given a patch feature.

However, it can reduce the computation burden if we do so. You can try to use it and see the results.

Jingyun Liang · Answer 2 · Mon Sep 27 2021 15:35:40 GMT+0800 (China Standard Time)

Feel free to open it if you have more questions.