IceClear / StableSR

Exploiting Diffusion Prior for Real-World Image Super-Resolution

Home Page:https://iceclear.github.io/projects/stablesr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed.

UIUC-Marisa3 opened this issue · comments

Hello, thanks for the work! We see many classic SR methods in the paper. The comparison to Real-ESRGAN+ looks promising!

However, it seems that the paper wants to claim that “our method using both synthetic and real world benchmarks demonstrates its superiority over current state-of-the-art approaches”. Just wondering would we have some comparisons to some real baselines and more common methods that people actually use?

For example:

Tiled diffusion’s DDIM inversion:
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

ControlNet Tile’s updates yesterday (looks like they are going to use this SR-like model to compete MidjourneyV5/5.1 in image details):
https://github.com/lllyasviel/ControlNet-v1-1-nightly#ControlNet-11-Tile

Loopback Scaler:
https://civitai.com/models/23188/loopback-scaler

DeepFloyd’s 256 stage model (IF-III-L):
https://github.com/deep-floyd/IF

Some of these methods are likely to use prompts, yet it seems that getting a prompt from small image is trivial for BLIP, and all ControlNets have a ‘guessmode’ that can use empty string as prompts. Loopback Scaler and Tiled diffusion seem to suggest people always using same string as prompts whatever the image is so they actually do not require prompts.

Most of these methods can be easily used by installing a latest version of automatic1111.

Yes, I also want a visual comparison.

If your method is competitive (For example if you can upscale to 4k images like the controlnet tile model), I will be happy to migrate your method to the automatic1111.

By the way I'm also studying in NTU. We may have opportunity to cooperate!

Hi, thanks for your interests of our work!
We currently do not compare StableSR with these open-sourced demos in our paper due to the following reasons:
(1) These open-sourced demos are not academic papers formally accepted by conferences or journals after official reviews.
(2) Our current released code and paper were finished around March, though just publicly available. And we did not notice these demos then.

We appreciate your valuable advice and we will go through these demos later.
We will provide visual comparisons soon :)
BTW, we would revise the title of the issue for easy understanding.

Next, we will compare with these baselines one by one.

Comparison with Tiled DDPM:
We first test on the image from the commonly used real-world test set here. For Tiled DDPM, we use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and follow most of the settings provided by Tiled DDPM. We use large sampling steps for better performance, the prompts are the same as Tiled DDPM:
image

Result of Tiled DDPM:
00018-1228422786

Result of StableSR:
tiger

We observe that Tiled DDPM tends to be struggling with fidelity as well as the quality in real-world cases.

We further show an example on AIGC SR, though StableSR is not for AIGC and never see such type of data during training. We directly test on the image provided by Tiled DDPM, the generated image is in 4K resolution:
StableSR result
Comparison with Zoomed LR
StableSR shows better fidelity compared with the result of Tiled DDPM.

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

Hi~ Thanks for your interest.
I am OK with that. Automatic1111 is a popular repo and we are glad to see that our research can contribute to practical use.
Just remember to include our license : )

Honestly, the main purpose of this paper is just to attempt to make contributions to the research community, even if the contributions may be tiny.
We do not mean to list and 'K.O.' all the other baselines in the world.
StableSR is good but not perfect, and we appreciate suggestions and efforts that can make StableSR better.

commented

StableSR is so far the best identity preserving scaling method out there. Meaning if you downscale it back to its original res, each pixel should average back to it's original value and it shouldn't make up features larger than the pixels. While the new details should look plausible and not like a mere filter.

Comparison between StableSr minus base image, and TiledDDPM minus base image using the highres image provided in @pkuliyi2015 's github page
stableSR_difference
TiledDD_difference

For the comparison with ControlNet Tile. It seems it is still in updating and not fully included in A1111. The gradio demo they provided currently does not support upscaling in tiles. And unfortunately, I am not familiar with gradio and failed to build it in A1111 after trying for two days. So I just skip this comparison.
However, from the results they showed in readme, I conjecture the fidelity of the results may not be very good and whether ControlNet Tile can be directly applied for real-world images with unknown degradation is also a question.
BTW, our StableSR has been fully released and anyone interested in it is welcomed to conduct the comparison : )

Comparison with Loopback Scaler:
I ran Loopback Scaler on A1111 and it reports "NAN error" on the tiger image using above and I did not figure out the reason.
However, I managed to run it on another example from the internet:
2684559-PH
I use the same prompt as used in Tiled DDPM.
I use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and other settings are shown below:
Screenshot 2023-05-15 183302

Result of Loopback Scaler:
img2img-0001-1472149575

Result of StableSR:
2684559-PH

Similarly, we observe that Loopback Scaler has inferior performance in this real-world case.

Comparison with DeepFloyd:
I use the stage 3 model for 4x upsampling.
I use the same prompts as in the above test and the noise level is set to 100 as default.

Result of DeepFloyd:
if_stage_III

Obviously, it is still mainly a fidelity issue, while the quality of some detailed textures are also not as good as StableSR.

Conclusion: As observed in the comparisons above, our StableSR significantly differs from the above diffusion-based upscalers with higher fidelity, which is also the main challenge of applying diffusion prior for SR as discussed in our paper.
We think the comparisons are not mainly about which method is the best, they just indicate that we focus on different applications.

Specifically, the above upscalers still focus on 'creation', and they mainly handle AIGC images whose degradation is different from real-world images captured by cameras. Hence, they mainly care about generation quality, which means generating new content in the upscaled results is allowed.
However, for real-world image SR, fidelity is very important and existing methods such as RealESRGAN+ and LDM are actually common methods that people often use. Our StableSR mainly focuses on this direction and we attempt to keep the high fidelity using several strategies introduced in our paper.

We believe this is not the end, but the beginning to explore the powerful ability of diffusion models for image restoration.