cswry / SeeSR

[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not work for people

CuddleSabe opened this issue · comments

Too scary...
截屏2023-12-26 17 19 44

Hello, can you provide the corresponding LR image and the corresponding inference command?

00004
0012
the command params are default

they all realsr, and have mpeg degradation

person2
person1

Hello, the two images above are the test results obtained on our end using the default command settings.
There are significant differences between our test results and yours.

We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images.
Please make sure to set the --upscale to 4 during the inference.

There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

  • During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.
  • The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

person2 person1

Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.

We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference.

There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

  • During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.
  • The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

thank for your reply!
well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input?
LOL

person2 person1
Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.
We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference.
There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

  • During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.
  • The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

thank for your reply! well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input? LOL

SeeSR operates a diffusion process within the latent space.

When you input an x1 LR image, its inherent low resolution diminish more after undergoing VAE Encoder compression (spatial resolution reduced by 8 times). In such a state, it is difficult to sustain the spatial structure within the limited latent space.

It induces the model towards uncontrollable generation, which also explains why the second facial image appear somewhat strange.