AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature Request]: new sampler "lcm" for lcm-loras

light-and-ray opened this issue · comments

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

This feature requires not so much: no a new model class support, only a new sampler

LCM had major update and now we can use it like a regular lora:
https://huggingface.co/latent-consistency/lcm-lora-sdv1-5
https://huggingface.co/latent-consistency/lcm-lora-ssd-1b
https://huggingface.co/latent-consistency/lcm-lora-sdxl

We only need to rename them and put into lora models directory. Set sampling steps to 4, CFG Scale to 1.0. Sampling method to "DPM2" or "Euler a". It gives decent results, but for better work it requires a special sampler, with is similar to others but with little change. You can look how it works in ComfyUI: comfyanonymous/ComfyUI@002aefa

Proposed workflow

  1. Go to Sampling method
  2. Select LCM sampler

Additional information

How it works now
Screenshot 2023-11-12 at 00-13-13 Stable Diffusion

CompyUI's recent update:
Screenshot_20231112_001837

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

Let's not forget that TensorRT for SDXL is sitting on the Dev branch, too. I desperately need all three of these moved up into the Main branch.

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

Let's not forget that TensorRT for SDXL is sitting on the Dev branch, too. I desperately need all three of these moved up into the Main branch.

let's hope this will be soon 🤞🤞🤞🤞

Someone reported nearly 5 gens per second on a 3090 with both TensorRT and LCM Lora. We're living in a pretty good timeline over here. Can't wait to see all this stuff get in so we can use it all together. Never ever thought I'd see a factor of 15+ speed up this quick.

Someone reported nearly 5 gens per second on a 3090 with both TensorRT and LCM Lora. We're living in a pretty good timeline over here. Can't wait to see all this stuff get in so we can use it all together. Never ever thought I'd see a factor of 15+ speed up this quick.

Makes me feel happy with my 3090 Ti!
Can you elaborate more on TensorRT? is that something that can be enabled on automatic1111? arent sdp and xformers the only options?

I could make tensorrt works with lcm only after merging lcm lora into model (and training difference in supermerger)

I’m looking forward to this feature too.
I wanna use contrlnets with it.

Someone reported nearly 5 gens per second on a 3090 with both TensorRT and LCM Lora. We're living in a pretty good timeline over here. Can't wait to see all this stuff get in so we can use it all together. Never ever thought I'd see a factor of 15+ speed up this quick.

I just installed TensorRT and yeah its amazing (3090Ti)
SDXL 1024x1024 from 11 seconds to 3.5 seconds an image! its a must!

@iChristGit I assume you ran this on the Dev branch?

@iChristGit I assume you ran this on the Dev branch?

Yeah git checkout dev

Someone from Reddit implemented it 4 day ago
https://www.reddit.com/r/StableDiffusion/comments/17ti2zo/you_can_add_the_lcm_sampler_to_a1111_with_a/

You can add the LCM sampler to A1111 with a little trick Tutorial | Guide

So I was trying out the new LCM LoRA and found out the sampler is missing in A1111. As as long shot I just copied the code from Comfy, and to my surprise it seems to work. I think.

You have to make two small edits with a text editor. Here's how you do it:

Edit the file sampling.py found at this path:

...\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py

Add the following at the end of the file:

@torch.no_grad()
def sample_lcm(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None):
    extra_args = {} if extra_args is None else extra_args
    noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
    s_in = x.new_ones([x.shape[0]])
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})

        x = denoised
        if sigmas[i + 1] > 0:
            x += sigmas[i + 1] * noise_sampler(sigmas[i], sigmas[i + 1])
    return x

The second change is done in the file sd_samplers_kdiffusion.py found here:

...\stable-diffusion-webui-new\modules\sd_samplers_kdiffusion.py

On line 39 add this:

('LCM Test', 'sample_lcm', ['lcm'], {}),

That should give you a new sampler option called 'LCM Test'.

изображение

But also there was found a bug in LCM Scheduler, and the algorithm will be updated
huggingface/diffusers#5815

There is almost no difference between "Euler a" and "LCM Test" for SD1. But for SDXL it solves all the problems

xl
1 5

I've modified patch from reddit to not edit external repository:

diff --git a/modules/sd_samplers_extra.py b/modules/sd_samplers_extra.py
index 1b981ca8..d154a2b6 100644
--- a/modules/sd_samplers_extra.py
+++ b/modules/sd_samplers_extra.py
@@ -72,3 +72,20 @@ def restart_sampler(model, x, sigmas, extra_args=None, callback=None, disable=No
         last_sigma = new_sigma

     return x
+
+
+@torch.no_grad()
+def sample_lcm(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None):
+    extra_args = {} if extra_args is None else extra_args
+    noise_sampler = k_diffusion.sampling.default_noise_sampler(x) if noise_sampler is None else noise_sampler
+    s_in = x.new_ones([x.shape[0]])
+    for i in tqdm.auto.trange(len(sigmas) - 1, disable=disable):
+        denoised = model(x, sigmas[i] * s_in, **extra_args)
+        if callback is not None:
+            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
+
+        x = denoised
+        if sigmas[i + 1] > 0:
+            x += sigmas[i + 1] * noise_sampler(sigmas[i], sigmas[i + 1])
+    return x
+
diff --git a/modules/sd_samplers_kdiffusion.py b/modules/sd_samplers_kdiffusion.py
index 8a8c87e0..b6c3dc44 100644
--- a/modules/sd_samplers_kdiffusion.py
+++ b/modules/sd_samplers_kdiffusion.py
@@ -36,6 +36,7 @@ samplers_k_diffusion = [
     ('DPM2 a Karras', 'sample_dpm_2_ancestral', ['k_dpm_2_a_ka'], {'scheduler': 'karras', 'discard_next_to_last_sigma': True, "uses_ensd": True, "second_order": True}),
     ('DPM++ 2S a Karras', 'sample_dpmpp_2s_ancestral', ['k_dpmpp_2s_a_ka'], {'scheduler': 'karras', "uses_ensd": True, "second_order": True}),
     ('Restart', sd_samplers_extra.restart_sampler, ['restart'], {'scheduler': 'karras', "second_order": True}),
+    ('LCM Test', sd_samplers_extra.sample_lcm, ['lcm'], {}),
 ]

I have wrapped this patch in extention! 🎉
https://github.com/light-and-ray/sd-webui-lcm-sampler

commented

We also need Skipping-Step from paper. Without it we still have incomplete implementation.

We hope @AUTOMATIC1111 will add this sampler natively and correctly, because I know nothing about defussion theory

There's this: https://github.com/0xbitches/sd-webui-lcm

Nooo. It is converted gradio demo of lcm model before the loras were released

It is separated tab. It's no longer relevant

There's this: https://github.com/0xbitches/sd-webui-lcm

Nooo. It is converted gradio demo of lcm model before the loras were released

It is separated tab. It's no longer relevant

Thanks. So is the light-and-ray thing an actual solution? I'm hoping for a version that will work in Deform.

Thanks. So is the light-and-ray thing an actual solution? I'm hoping for a version that will work in Deform.

Yes, you're right

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

But for SDXL it solves all the problems

Thanks for your research! Any ideas why in sdxl other samplers not work?

We hope @AUTOMATIC1111 will add this sampler natively and correctly, because I know nothing about defussion theory

Do @AUTOMATIC1111 have time to support the LCM sampler?

If you install the latest update of the Animatediff extension, that installs the LCM sampler for you. No need for this workaround, now.
Link: https://github.com/continue-revolution/sd-webui-animatediff

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Dude, what a BLESSING! Installing Animatediff just added the LCM sampler as well. Also you can totally increase the step count to 10-15 and get even better quality it seems. Thank you!

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Dude, what a BLESSING! Installing Animatediff just added the LCM sampler as well. Also you can totally increase the step count to 10-15 and get even better quality it seems. Thank you!

I am getting an image per second on 3090Ti, with TensorRT (no LCM) it is around 3.5 second per image.
But the results are so bad, its not worth using, i have tried 8-15 steps, 2 cfg scale, results come up almost identical to each other in all ways, without deep colors, do you get decent results?

I am getting an image per second on 3090Ti, with TensorRT (no LCM) it is around 3.5 second per image. But the results are so bad, its not worth using, i have tried 8-15 steps, 2 cfg scale, results come up almost identical to each other in all ways, without deep colors, do you get decent results?

I'm using an RTX 3060 12GB and I'm getting surprisingly good results. My settings are lora weight of 0.75, LCM sampler, 6-8 steps and CFG of 2.0. Try these settings and see if things improve at all for you.

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

And with LCM you want TinyVAE. It turns 70ms 512x512 4 step generations into 45ms on my 4090.
For some reason the way A1111 is fusing the Lora must be different. It only make about a 2.5X speed up instead of the 10X I get in a simple diffusers pipeline.

You need to understand several things

  1. Most likely comfyanonymous is paid by SAI so that he can work full time on ComfyUI, but we A1111 developers are paid nothing.
  2. Although A1111 is much more convenient to use compared to ComfyUI, the internal is much harder to modify. If you read my source code of AnimateDiff, you will understand this much deeper. However ComfyUI is made of a bunch of nodes which almost don’t affect each other.
  3. A1111 takes cautious step to add new features - most features are not universal so it’s better to come with an extension. This is actually better for us programmers - our extensions will not be abruptly broken.
  4. It is not easy to write an elegant implementation to convert diffusers based research into A1111. It is actually easy to write a tab that force you to use diffusers, but I don’t want to do that.
  5. Theoretically A1111 should not be slower than anything else. A known reason that A1111 is “slower” than diffusers is that - some samplers require 2 unet forward. In diffusers, step means the number of unet forwards; in A1111, step means the number of sampler forwards. There might be other reasons.

You need to understand several things

  1. Most likely comfyanonymous is paid by SAI so that he can work full time on ComfyUI, but we A1111 developers are paid nothing.
  2. Although A1111 is much more convenient to use compared to ComfyUI, the internal is much harder to modify. If you read my source code of AnimateDiff, you will understand this much deeper. However ComfyUI is made of a bunch of nodes which almost don’t affect each other.
  3. A1111 takes cautious step to add new features - most features are not universal so it’s better to come with an extension. This is actually better for us programmers - our extensions will not be abruptly broken.
  4. It is not easy to write an elegant implementation to convert diffusers based research into A1111. It is actually easy to write a tab that force you to use diffusers, but I don’t want to do that.
  5. Theoretically A1111 should not be slower than anything else. A known reason that A1111 is “slower” than diffusers is that - some samplers require 2 unet forward. In diffusers, step means the number of unet forwards; in A1111, step means the number of sampler forwards. There might be other reasons.

Do you feel like Automatic1111 is a dead-end in the long run? Should we all be putting more effort into learning ComfyUI?

Do you feel like Automatic1111 is a dead-end in the long run? Should we all be putting more effort into learning ComfyUI?

I've never had a second having any thoughts like this, and I will almost certainly stick to A1111. I believe that creating a great user experience is our (programmer's) mission, and you, users, should focus more on how to use, and teach us how to actually use our software. A lot of people are much better at using my extension than myself. Designing a user-friendly software is never easy in any UI, but I enjoy that.

A1111 has done his best. A1111 WebUI is easy-to-use, fast and memory efficient. We should not critisize A1111 before having any clear evidence. Dispite the fact that it is tricky to hook some functions, some other functions are designed so good that it can easily fit my need.

That said, we programmers do need money. Working for love is never sustainable, unless we are as rich as Mark Zuckerberg. Mark has already been extremely rich through Facebook, so he open-sourced almost everything about ML in Meta. Sam is not as rich as Mark, so OpenAI becomes CloseAI.

They have updated the algorithm of sampler
https://github.com/huggingface/diffusers/pull/5836/files

@light-and-ray How do I use this update? I tested lcm and there are some problems in controlnet inpaint, it doesn't recognize red anymore.

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Dude, what a BLESSING! Installing Animatediff just added the LCM sampler as well. Also you can totally increase the step count to 10-15 and get even better quality it seems. Thank you!

I am getting an image per second on 3090Ti, with TensorRT (no LCM) it is around 3.5 second per image. But the results are so bad, its not worth using, i have tried 8-15 steps, 2 cfg scale, results come up almost identical to each other in all ways, without deep colors, do you get decent results?

please can you elaborate how ti install TENSOR RT? I have an RTX 2060, do i need to install a driver from NVIDIA, or just the webui tensor rt extension??????

@continue-revolution cold you apply this update in your extension?

They have updated the algorithm of sampler
https://github.com/huggingface/diffusers/pull/5836/files

post a feature request in my repo and AT the original author of LCM. He will make the decision.

After trying it out, it's really fantastic. For the Sampling method, select Euler a, Euler, or LCM. Set the Sampling steps to 6-8, and choose the lora option as either lora:lcm-lora-sdv1-5:0.6 or lora:lcm-lora-sdv1-5:0.7. For the CFG Scale, select either 1.2 or 1.5.
xyz_grid-0002-3834625937
xyz_grid-0003-3834625937

After trying it out, it's really fantastic. For the Sampling method, select Euler a, Euler, or LCM. Set the Sampling steps to 6-8, and choose the lora option as either lora:lcm-lora-sdv1-5:0.6 or lora:lcm-lora-sdv1-5:0.7. For the CFG Scale, select either 1.2 or 1.5. xyz_grid-0002-3834625937 xyz_grid-0003-3834625937

yes but for high quality animation, you can do even 15 steps...... try it with the LCM just like you did..... but ONLY for animatediff animations... not for photos......

@LIQUIDMIND111 Why do you say NOT for photos? The images aren't perfect, but after an Ultimate Upscale application, the results look really, really nice. 🤷🏼‍♂️

@continue-revolution
It seems that LCM in animatediff is not compatible with regional-prompter. I found in my test that after installing regional-prompter (Did not choose Active), it takes 5 seconds to draw, but after uninstalling regional-prompter, the same parameters only take 1 second to draw.