JingyunLiang / SwinIR

Hi, Jingyun, nice work! I just wonder why SwinIR function needs to set the 'img_size'. It is somehow kind of inconvenient, especially for test, since we usually want to test on different sizes of images, right? Is there any particular reason for this? Since Swin Transformer does not need this because they use padding operations. Besides, are there any requirements of the input size, i.e., must be the multiple of a number, or something else? Thanks.

I'm sorry for the misunderstanding. In fact, SwinIR does not need it as well. It can deal with different sizes of images because we will pad them to be a multiple of window_size.

We input the image_size=48 or 64 into main_test_swinir.py just for the benefit of differentiating two pre-trained models that we provide:

patch_size=48, dataset=DIV2K
patch_size=64, dataset=DIV2K+Flickr2K

As can be seen in Table 2 of the paper, we train SwinIR on two different settings in classical image SR for fair comparison with two different kinds of models.

Thanks for the reply. But I found if I use the default settings in this repo to test, there will be a reshape error. I wonder if there is anything wrong with my code.

May I ask which dataset and which task do you use? It works fine for me. Could you print the network input shape before this line?

SwinIR/main_test_swinir.py

Line 62 in 5bd10ce

output = model(img_lq)

.

If the network input size is a multiple of window_size, there should be no problem.

Well, I just test it on REDS to regard it as a baseline. Thus, my input is (1, 3, 720//4, 1280//4). So I need to change the window_size to make it work?

Got it! I change window_size to 4 and it works. So if I just want to train a new network without finetuning, the setting of 'img_size' does not make a difference, right?

I think you need add the padding codes as follows (lines 57 to 62) before inputting your testing images into the model. The principle is that the network input size has to be a multiple of window_size.

SwinIR/main_test_swinir.py

Line 57 in 5bd10ce

_, _, h_old, w_old = img_lq.size()

I think you cannot change the window_size once the model is trained. This may lead to bad results.
If you want to train a new one, you can set up your own img_size. (a bit confusing. Here, the img_size is just the size of the whole image we input into the network during training. It equals to the patch_size which we often refer to in image SR.)

Well, from my settings, during training I cropped the image into 64×64 and for validation, the input is the whole image 180×320. I predefine the network before training and validation. Thus, I wonder if I have to change the img_size setting before validation.

No, you don't need to. This is exactly what I did in experiments.

Great! Nice talk with you and thanks for your patient reply :)

Hi, @JingyunLiang

Can I ask why we use the cropped like $ 128 \times 128 \times 3 $ (so-called patch size) for training, instead of using the same size as used in the validation phase?

Thank you.

@JingyunLiang Besides, the window size (namely the patch_size in your network_swinir.py ) for denoising is set to 1, what does that mean? Can I understand in this way: actually there is no so-called window, instead, the denoising is based on pixel-wise attention?

In training, you need to use the same image size for batch training. In validation, different testing images may have different sizes. The common practice is to randomly crop 64x64 or 128x128 image patches for training.
Sorry for the abuse of notations. Just forget all notations/definition and I will give you a concrete example. Given a 433x532 training image, we crop a 128x128 image patch for training. The 128x128 image is divided into non-overlapping 8x8 windows. In side each 8x8 window, we compute the attention matrix between every pixel (1x1).

Thank you very much for the example. I totally understand your setting now. ^_^ @JingyunLiang

The input size during test