JingyunLiang / SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page:https://arxiv.org/abs/2108.10257

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues about the patch embedding.

ddghjikle opened this issue · comments

Hi, thanks very much for sharing this wonderful work. According to the definition of PatchEmbed(nn.Module). It seems that the parameters such as patch_size and img_size are not used. It seems that performance improvements of SwinIRs are provided by these MAS and MLP layers. Of course, multiple skip connections in the RSTB and STL are also helpful. I am curious about why the SwinIRs do not form patches with multiple pixels. For example, the PatchEmbed method used in
《Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions》.

Your idea deserves a try. To be honest, I didn't try it because I believe image restoration is a very local problem. We should process the image pixel by pixel (attention between pixels). From my intuition, there should have a performance drop if we process them patch by patch (attention between patches such as what IPT does). Imagine that the patch embedding process (e.g., using a strided convolution layer) is not invertible. The spatial information within a patch is lost, which may have extreme burden for the model to reconstruct pixels (pixel is our final goal) given a patch feature.

However, it can reduce the computation burden if we do so. You can try to use it and see the results.

Feel free to open it if you have more questions.