JingyunLiang / SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page:https://arxiv.org/abs/2108.10257

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about the framework

yzcv opened this issue · comments

commented

Hi, @JingyunLiang

I appreciate your fabulous work but I have a question about the framework. Did you ever try the Unet-like framework or encoder-decoder one for the Deep Feature Extraction Block (the whole transformer block)? As your framework is all of the same RSTB blocks, I am wondering if the encoder-decoder idea is helpful for the performance gain?

Thank you very much.

We mainly try to deal with the SR problem at the beginning. This is why we use such a design. UNet-like framework should also work. You can find similar ideas here, here and here.

commented

We mainly try to deal with the SR problem at the beginning. This is why we use such a design. UNet-like framework should also work. You can find similar ideas here and here.

So do you mean that your framework is more suitable for the SR task than the Unet-like or autoencoder-like transformer-based framework?

Thanks.

Possibly yes from my intuition. But we need experiments to prove it.

commented

Thanks very much for the patient reply.

commented

Hi,

I now understand why this framework works well. Other tasks like segmentation or detection, need to extract the high-level semantic information. Thus the unet or autoencoder can make a contribution. Different from those tasks, SR is aimed at utilizing all the information from the LR image and we do not want to lose any low-level information. As such, your framework is intuitively better for SR.