Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed.

Question

Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed.

UIUC-Marisa3 opened this issue a year ago · comments

Hello, thanks for the work! We see many classic SR methods in the paper. The comparison to Real-ESRGAN+ looks promising!

However, it seems that the paper wants to claim that “our method using both synthetic and real world benchmarks demonstrates its superiority over current state-of-the-art approaches”. Just wondering would we have some comparisons to some real baselines and more common methods that people actually use?

For example:

Tiled diffusion’s DDIM inversion:
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

ControlNet Tile’s updates yesterday (looks like they are going to use this SR-like model to compete MidjourneyV5/5.1 in image details):
https://github.com/lllyasviel/ControlNet-v1-1-nightly#ControlNet-11-Tile

Loopback Scaler:
https://civitai.com/models/23188/loopback-scaler

DeepFloyd’s 256 stage model (IF-III-L):
https://github.com/deep-floyd/IF

Some of these methods are likely to use prompts, yet it seems that getting a prompt from small image is trivial for BLIP, and all ControlNets have a ‘guessmode’ that can use empty string as prompts. Loopback Scaler and Tiled diffusion seem to suggest people always using same string as prompts whatever the image is so they actually do not require prompts.

Most of these methods can be easily used by installing a latest version of automatic1111.

LightChaser · Answer 1 · Sat May 13 2023 05:38:32 GMT+0800 (China Standard Time)

Yes, I also want a visual comparison.

If your method is competitive (For example if you can upscale to 4k images like the controlnet tile model), I will be happy to migrate your method to the automatic1111.

By the way I'm also studying in NTU. We may have opportunity to cooperate!

Jianyi Wang · Answer 2 · Sat May 13 2023 14:26:02 GMT+0800 (China Standard Time)

Hi, thanks for your interests of our work!
We currently do not compare StableSR with these open-sourced demos in our paper due to the following reasons:
(1) These open-sourced demos are not academic papers formally accepted by conferences or journals after official reviews.
(2) Our current released code and paper were finished around March, though just publicly available. And we did not notice these demos then.

We appreciate your valuable advice and we will go through these demos later.
We will provide visual comparisons soon :)
BTW, we would revise the title of the issue for easy understanding.

Next, we will compare with these baselines one by one.

Jianyi Wang · Answer 3 · Sat May 13 2023 22:16:48 GMT+0800 (China Standard Time)

Comparison with Tiled DDPM:
We first test on the image from the commonly used real-world test set here. For Tiled DDPM, we use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and follow most of the settings provided by Tiled DDPM. We use large sampling steps for better performance, the prompts are the same as Tiled DDPM:

Result of Tiled DDPM:

Result of StableSR:

We observe that Tiled DDPM tends to be struggling with fidelity as well as the quality in real-world cases.

Jianyi Wang · Answer 4 · Sat May 13 2023 22:21:51 GMT+0800 (China Standard Time)

We further show an example on AIGC SR, though StableSR is not for AIGC and never see such type of data during training. We directly test on the image provided by Tiled DDPM, the generated image is in 4K resolution:
StableSR result
Comparison with Zoomed LR
StableSR shows better fidelity compared with the result of Tiled DDPM.

LightChaser · Answer 5 · Sun May 14 2023 00:36:21 GMT+0800 (China Standard Time)

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

Jianyi Wang · Answer 6 · Sun May 14 2023 01:07:17 GMT+0800 (China Standard Time)

Thanks for your effort in testing.

It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111?

Or if you want to start the project on your own, I may be able to help.

Hi~ Thanks for your interest.
I am OK with that. Automatic1111 is a popular repo and we are glad to see that our research can contribute to practical use.
Just remember to include our license : )

Honestly, the main purpose of this paper is just to attempt to make contributions to the research community, even if the contributions may be tiny.
We do not mean to list and 'K.O.' all the other baselines in the world.
StableSR is good but not perfect, and we appreciate suggestions and efforts that can make StableSR better.

wo262 · Answer 7 · Mon May 15 2023 16:59:07 GMT+0800 (China Standard Time)

StableSR is so far the best identity preserving scaling method out there. Meaning if you downscale it back to its original res, each pixel should average back to it's original value and it shouldn't make up features larger than the pixels. While the new details should look plausible and not like a mere filter.

Comparison between StableSr minus base image, and TiledDDPM minus base image using the highres image provided in @pkuliyi2015 's github page

Jianyi Wang · Answer 8 · Mon May 15 2023 17:36:15 GMT+0800 (China Standard Time)

For the comparison with ControlNet Tile. It seems it is still in updating and not fully included in A1111. The gradio demo they provided currently does not support upscaling in tiles. And unfortunately, I am not familiar with gradio and failed to build it in A1111 after trying for two days. So I just skip this comparison.
However, from the results they showed in readme, I conjecture the fidelity of the results may not be very good and whether ControlNet Tile can be directly applied for real-world images with unknown degradation is also a question.
BTW, our StableSR has been fully released and anyone interested in it is welcomed to conduct the comparison : )

Jianyi Wang · Answer 9 · Mon May 15 2023 19:53:27 GMT+0800 (China Standard Time)

Comparison with Loopback Scaler:
I ran Loopback Scaler on A1111 and it reports "NAN error" on the tiger image using above and I did not figure out the reason.
However, I managed to run it on another example from the internet:

I use the same prompt as used in Tiled DDPM.
I use the same pretrained diffusion model as StableSR (v2-1_512-ema-pruned.ckpt) and other settings are shown below:

Result of Loopback Scaler:

Result of StableSR:

Similarly, we observe that Loopback Scaler has inferior performance in this real-world case.

Jianyi Wang · Answer 10 · Tue May 16 2023 02:18:09 GMT+0800 (China Standard Time)

Comparison with DeepFloyd:
I use the stage 3 model for 4x upsampling.
I use the same prompts as in the above test and the noise level is set to 100 as default.

Result of DeepFloyd:

Obviously, it is still mainly a fidelity issue, while the quality of some detailed textures are also not as good as StableSR.

Jianyi Wang · Answer 11 · Tue May 16 2023 02:33:52 GMT+0800 (China Standard Time)

Conclusion: As observed in the comparisons above, our StableSR significantly differs from the above diffusion-based upscalers with higher fidelity, which is also the main challenge of applying diffusion prior for SR as discussed in our paper.
We think the comparisons are not mainly about which method is the best, they just indicate that we focus on different applications.

Specifically, the above upscalers still focus on 'creation', and they mainly handle AIGC images whose degradation is different from real-world images captured by cameras. Hence, they mainly care about generation quality, which means generating new content in the upscaled results is allowed.
However, for real-world image SR, fidelity is very important and existing methods such as RealESRGAN+ and LDM are actually common methods that people often use. Our StableSR mainly focuses on this direction and we attempt to keep the high fidelity using several strategies introduced in our paper.

We believe this is not the end, but the beginning to explore the powerful ability of diffusion models for image restoration.