JingyunLiang / SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page:https://arxiv.org/abs/2108.10257

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

patch_size

wudishuaishuainiu opened this issue · comments

I found that the patch_size of the network setting uses the initial 1, then the pixel will become a token. What is the reason for not using the image block (e.g, 4*4) as the token?

Please refer to #14

Your idea deserves a try. To be honest, I didn't try it because I believe image restoration is a very local problem. We should process the image pixel by pixel (attention between pixels). From my intuition, there should have a performance drop if we process them patch by patch (attention between patches such as what IPT does). Imagine that the patch embedding process (e.g., using a strided convolution layer) is not invertible. The spatial information within a patch is lost, which may have extreme burden for the model to reconstruct pixels (pixel is our final goal) given a patch feature.

However, it can reduce the computation burden if we do so. You can try to use it and see the results.

Your idea deserves a try. To be honest, I didn't try it because I believe image restoration is a very local problem. We should process the image pixel by pixel (attention between pixels). From my intuition, there should have a performance drop if we process them patch by patch (attention between patches such as what IPT does). Imagine that the patch embedding process (e.g., using a strided convolution layer) is not invertible. The spatial information within a patch is lost, which may have extreme burden for the model to reconstruct pixels (pixel is our final goal) given a patch feature.

However, it can reduce the computation burden if we do so. You can try to use it and see the results.

Feel free to open it if you have more questions.