Not work for people

Question

Not work for people

CuddleSabe opened this issue 6 months ago · comments

hht2001 commented 6 months ago

hht2001 · Answer 1 · Tue Dec 26 2023 17:20:26 GMT+0800 (China Standard Time)

Too scary...

Rongyuan Wu · Answer 2 · Tue Dec 26 2023 17:37:28 GMT+0800 (China Standard Time)

Hello, can you provide the corresponding LR image and the corresponding inference command？

hht2001 · Answer 3 · Tue Dec 26 2023 17:39:19 GMT+0800 (China Standard Time)

the command params are default

hht2001 · Answer 4 · Tue Dec 26 2023 17:42:06 GMT+0800 (China Standard Time)

they all realsr, and have mpeg degradation

Rongyuan Wu · Answer 5 · Tue Dec 26 2023 19:44:52 GMT+0800 (China Standard Time)

Hello, the two images above are the test results obtained on our end using the default command settings.
There are significant differences between our test results and yours.

We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images.
Please make sure to set the --upscale to 4 during the inference.

There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.
The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

hht2001 · Answer 6 · Wed Dec 27 2023 09:56:59 GMT+0800 (China Standard Time)

Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.

We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference.

There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.

The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

thank for your reply!
well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input?
LOL

Rongyuan Wu · Answer 7 · Wed Dec 27 2023 10:59:48 GMT+0800 (China Standard Time)

Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.
We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference.
There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.

The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

thank for your reply! well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input? LOL

SeeSR operates a diffusion process within the latent space.

When you input an x1 LR image, its inherent low resolution diminish more after undergoing VAE Encoder compression (spatial resolution reduced by 8 times). In such a state, it is difficult to sustain the spatial structure within the limited latent space.

It induces the model towards uncontrollable generation, which also explains why the second facial image appear somewhat strange.