[Feature Request]: new sampler "lcm" for lcm-loras

Question

[Feature Request]: new sampler "lcm" for lcm-loras

light-and-ray opened this issue 8 months ago · comments

Andray commented 8 months ago

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

This feature requires not so much: no a new model class support, only a new sampler

LCM had major update and now we can use it like a regular lora:
https://huggingface.co/latent-consistency/lcm-lora-sdv1-5
https://huggingface.co/latent-consistency/lcm-lora-ssd-1b
https://huggingface.co/latent-consistency/lcm-lora-sdxl

We only need to rename them and put into lora models directory. Set sampling steps to 4, CFG Scale to 1.0. Sampling method to "DPM2" or "Euler a". It gives decent results, but for better work it requires a special sampler, with is similar to others but with little change. You can look how it works in ComfyUI: comfyanonymous/ComfyUI@002aefa

Proposed workflow

Go to Sampling method
Select LCM sampler

Additional information

How it works now

CompyUI's recent update:

BahzBeih · Answer 1 · Sun Nov 12 2023 10:25:24 GMT+0800 (China Standard Time)

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

DarthBuckeye · Answer 2 · Mon Nov 13 2023 08:25:52 GMT+0800 (China Standard Time)

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

Let's not forget that TensorRT for SDXL is sitting on the Dev branch, too. I desperately need all three of these moved up into the Main branch.

fgtm2023 · Answer 3 · Mon Nov 13 2023 12:17:14 GMT+0800 (China Standard Time)

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

Let's not forget that TensorRT for SDXL is sitting on the Dev branch, too. I desperately need all three of these moved up into the Main branch.

let's hope this will be soon 🤞🤞🤞🤞

Steven Lu · Answer 4 · Mon Nov 13 2023 13:52:13 GMT+0800 (China Standard Time)

Someone reported nearly 5 gens per second on a 3090 with both TensorRT and LCM Lora. We're living in a pretty good timeline over here. Can't wait to see all this stuff get in so we can use it all together. Never ever thought I'd see a factor of 15+ speed up this quick.

iChristGit · Answer 5 · Mon Nov 13 2023 22:03:08 GMT+0800 (China Standard Time)

Someone reported nearly 5 gens per second on a 3090 with both TensorRT and LCM Lora. We're living in a pretty good timeline over here. Can't wait to see all this stuff get in so we can use it all together. Never ever thought I'd see a factor of 15+ speed up this quick.

Makes me feel happy with my 3090 Ti!
Can you elaborate more on TensorRT? is that something that can be enabled on automatic1111? arent sdp and xformers the only options?

Andray · Answer 6 · Mon Nov 13 2023 22:09:07 GMT+0800 (China Standard Time)

I could make tensorrt works with lcm only after merging lcm lora into model (and training difference in supermerger)

szriru · Answer 7 · Tue Nov 14 2023 17:25:03 GMT+0800 (China Standard Time)

I’m looking forward to this feature too.
I wanna use contrlnets with it.

iChristGit · Answer 8 · Tue Nov 14 2023 19:39:42 GMT+0800 (China Standard Time)

Someone reported nearly 5 gens per second on a 3090 with both TensorRT and LCM Lora. We're living in a pretty good timeline over here. Can't wait to see all this stuff get in so we can use it all together. Never ever thought I'd see a factor of 15+ speed up this quick.

I just installed TensorRT and yeah its amazing (3090Ti)
SDXL 1024x1024 from 11 seconds to 3.5 seconds an image! its a must!

DarthBuckeye · Answer 9 · Tue Nov 14 2023 20:18:50 GMT+0800 (China Standard Time)

@iChristGit I assume you ran this on the Dev branch?

iChristGit · Answer 10 · Tue Nov 14 2023 23:39:29 GMT+0800 (China Standard Time)

@iChristGit I assume you ran this on the Dev branch?

Yeah git checkout dev

Andray · Answer 11 · Thu Nov 16 2023 20:48:09 GMT+0800 (China Standard Time)

Someone from Reddit implemented it 4 day ago
https://www.reddit.com/r/StableDiffusion/comments/17ti2zo/you_can_add_the_lcm_sampler_to_a1111_with_a/

You can add the LCM sampler to A1111 with a little trick Tutorial | Guide
So I was trying out the new LCM LoRA and found out the sampler is missing in A1111. As as long shot I just copied the code from Comfy, and to my surprise it seems to work. I think.

You have to make two small edits with a text editor. Here's how you do it:

Edit the file sampling.py found at this path:

...\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py

Add the following at the end of the file:
@torch.no_grad()
def sample_lcm(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None):
    extra_args = {} if extra_args is None else extra_args
    noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
    s_in = x.new_ones([x.shape[0]])
    for i in trange(len(sigmas) - 1, disable=disable):
        denoised = model(x, sigmas[i] * s_in, **extra_args)
        if callback is not None:
            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})

        x = denoised
        if sigmas[i + 1] > 0:
            x += sigmas[i + 1] * noise_sampler(sigmas[i], sigmas[i + 1])
    return x
The second change is done in the file sd_samplers_kdiffusion.py found here:

...\stable-diffusion-webui-new\modules\sd_samplers_kdiffusion.py

On line 39 add this:
('LCM Test', 'sample_lcm', ['lcm'], {}),
That should give you a new sampler option called 'LCM Test'.

Andray · Answer 12 · Thu Nov 16 2023 20:52:01 GMT+0800 (China Standard Time)

But also there was found a bug in LCM Scheduler, and the algorithm will be updated
huggingface/diffusers#5815

Andray · Answer 13 · Thu Nov 16 2023 21:42:10 GMT+0800 (China Standard Time)

There is almost no difference between "Euler a" and "LCM Test" for SD1. But for SDXL it solves all the problems

Andray · Answer 14 · Thu Nov 16 2023 21:47:47 GMT+0800 (China Standard Time)

I've modified patch from reddit to not edit external repository:

diff --git a/modules/sd_samplers_extra.py b/modules/sd_samplers_extra.py
index 1b981ca8..d154a2b6 100644
--- a/modules/sd_samplers_extra.py
+++ b/modules/sd_samplers_extra.py
@@ -72,3 +72,20 @@ def restart_sampler(model, x, sigmas, extra_args=None, callback=None, disable=No
         last_sigma = new_sigma

     return x
+
+
+@torch.no_grad()
+def sample_lcm(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None):
+    extra_args = {} if extra_args is None else extra_args
+    noise_sampler = k_diffusion.sampling.default_noise_sampler(x) if noise_sampler is None else noise_sampler
+    s_in = x.new_ones([x.shape[0]])
+    for i in tqdm.auto.trange(len(sigmas) - 1, disable=disable):
+        denoised = model(x, sigmas[i] * s_in, **extra_args)
+        if callback is not None:
+            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
+
+        x = denoised
+        if sigmas[i + 1] > 0:
+            x += sigmas[i + 1] * noise_sampler(sigmas[i], sigmas[i + 1])
+    return x
+
diff --git a/modules/sd_samplers_kdiffusion.py b/modules/sd_samplers_kdiffusion.py
index 8a8c87e0..b6c3dc44 100644
--- a/modules/sd_samplers_kdiffusion.py
+++ b/modules/sd_samplers_kdiffusion.py
@@ -36,6 +36,7 @@ samplers_k_diffusion = [
     ('DPM2 a Karras', 'sample_dpm_2_ancestral', ['k_dpm_2_a_ka'], {'scheduler': 'karras', 'discard_next_to_last_sigma': True, "uses_ensd": True, "second_order": True}),
     ('DPM++ 2S a Karras', 'sample_dpmpp_2s_ancestral', ['k_dpmpp_2s_a_ka'], {'scheduler': 'karras', "uses_ensd": True, "second_order": True}),
     ('Restart', sd_samplers_extra.restart_sampler, ['restart'], {'scheduler': 'karras', "second_order": True}),
+    ('LCM Test', sd_samplers_extra.sample_lcm, ['lcm'], {}),
 ]

Andray · Answer 15 · Thu Nov 16 2023 23:25:02 GMT+0800 (China Standard Time)

I have wrapped this patch in extention! 🎉
https://github.com/light-and-ray/sd-webui-lcm-sampler

wcde · Answer 16 · Thu Nov 16 2023 23:59:36 GMT+0800 (China Standard Time)

We also need Skipping-Step from paper. Without it we still have incomplete implementation.

Andray · Answer 17 · Fri Nov 17 2023 00:04:26 GMT+0800 (China Standard Time)

We hope @AUTOMATIC1111 will add this sampler natively and correctly, because I know nothing about defussion theory

Charlie Magee · Answer 18 · Fri Nov 17 2023 03:44:54 GMT+0800 (China Standard Time)

There's this: https://github.com/0xbitches/sd-webui-lcm

Andray · Answer 19 · Fri Nov 17 2023 03:50:05 GMT+0800 (China Standard Time)

There's this: https://github.com/0xbitches/sd-webui-lcm

Nooo. It is converted gradio demo of lcm model before the loras were released

It is separated tab. It's no longer relevant

Charlie Magee · Answer 20 · Fri Nov 17 2023 06:38:16 GMT+0800 (China Standard Time)

There's this: https://github.com/0xbitches/sd-webui-lcm

Nooo. It is converted gradio demo of lcm model before the loras were released

It is separated tab. It's no longer relevant

Thanks. So is the light-and-ray thing an actual solution? I'm hoping for a version that will work in Deform.

Andray · Answer 21 · Fri Nov 17 2023 06:42:04 GMT+0800 (China Standard Time)

Thanks. So is the light-and-ray thing an actual solution? I'm hoping for a version that will work in Deform.

Yes, you're right

continue revolution · Answer 22 · Fri Nov 17 2023 10:29:16 GMT+0800 (China Standard Time)

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Vadim Kulibaba · Answer 23 · Fri Nov 17 2023 16:58:39 GMT+0800 (China Standard Time)

But for SDXL it solves all the problems

Thanks for your research! Any ideas why in sdxl other samplers not work?

jonezhang · Answer 24 · Fri Nov 17 2023 17:43:49 GMT+0800 (China Standard Time)

We hope @AUTOMATIC1111 will add this sampler natively and correctly, because I know nothing about defussion theory

Do @AUTOMATIC1111 have time to support the LCM sampler?

DarthBuckeye · Answer 25 · Sat Nov 18 2023 21:19:13 GMT+0800 (China Standard Time)

If you install the latest update of the Animatediff extension, that installs the LCM sampler for you. No need for this workaround, now.
Link: https://github.com/continue-revolution/sd-webui-animatediff

ThisIsNetsu · Answer 26 · Sun Nov 19 2023 21:43:38 GMT+0800 (China Standard Time)

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Dude, what a BLESSING! Installing Animatediff just added the LCM sampler as well. Also you can totally increase the step count to 10-15 and get even better quality it seems. Thank you!

iChristGit · Answer 27 · Sun Nov 19 2023 23:34:36 GMT+0800 (China Standard Time)

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Dude, what a BLESSING! Installing Animatediff just added the LCM sampler as well. Also you can totally increase the step count to 10-15 and get even better quality it seems. Thank you!

I am getting an image per second on 3090Ti, with TensorRT (no LCM) it is around 3.5 second per image.
But the results are so bad, its not worth using, i have tried 8-15 steps, 2 cfg scale, results come up almost identical to each other in all ways, without deep colors, do you get decent results?

DarthBuckeye · Answer 28 · Mon Nov 20 2023 02:30:24 GMT+0800 (China Standard Time)

I am getting an image per second on 3090Ti, with TensorRT (no LCM) it is around 3.5 second per image. But the results are so bad, its not worth using, i have tried 8-15 steps, 2 cfg scale, results come up almost identical to each other in all ways, without deep colors, do you get decent results?

I'm using an RTX 3060 12GB and I'm getting surprisingly good results. My settings are lora weight of 0.75, LCM sampler, 6-8 steps and CFG of 2.0. Try these settings and see if things improve at all for you.

Dan Wood · Answer 29 · Mon Nov 20 2023 12:05:41 GMT+0800 (China Standard Time)

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

And with LCM you want TinyVAE. It turns 70ms 512x512 4 step generations into 45ms on my 4090.
For some reason the way A1111 is fusing the Lora must be different. It only make about a 2.5X speed up instead of the 10X I get in a simple diffusers pipeline.

continue revolution · Answer 30 · Mon Nov 20 2023 12:37:54 GMT+0800 (China Standard Time)

You need to understand several things

Most likely comfyanonymous is paid by SAI so that he can work full time on ComfyUI, but we A1111 developers are paid nothing.
Although A1111 is much more convenient to use compared to ComfyUI, the internal is much harder to modify. If you read my source code of AnimateDiff, you will understand this much deeper. However ComfyUI is made of a bunch of nodes which almost don’t affect each other.
A1111 takes cautious step to add new features - most features are not universal so it’s better to come with an extension. This is actually better for us programmers - our extensions will not be abruptly broken.
It is not easy to write an elegant implementation to convert diffusers based research into A1111. It is actually easy to write a tab that force you to use diffusers, but I don’t want to do that.
Theoretically A1111 should not be slower than anything else. A known reason that A1111 is “slower” than diffusers is that - some samplers require 2 unet forward. In diffusers, step means the number of unet forwards; in A1111, step means the number of sampler forwards. There might be other reasons.

DarthBuckeye · Answer 31 · Mon Nov 20 2023 16:45:59 GMT+0800 (China Standard Time)

You need to understand several things

Most likely comfyanonymous is paid by SAI so that he can work full time on ComfyUI, but we A1111 developers are paid nothing.

Although A1111 is much more convenient to use compared to ComfyUI, the internal is much harder to modify. If you read my source code of AnimateDiff, you will understand this much deeper. However ComfyUI is made of a bunch of nodes which almost don’t affect each other.

A1111 takes cautious step to add new features - most features are not universal so it’s better to come with an extension. This is actually better for us programmers - our extensions will not be abruptly broken.

It is not easy to write an elegant implementation to convert diffusers based research into A1111. It is actually easy to write a tab that force you to use diffusers, but I don’t want to do that.

Theoretically A1111 should not be slower than anything else. A known reason that A1111 is “slower” than diffusers is that - some samplers require 2 unet forward. In diffusers, step means the number of unet forwards; in A1111, step means the number of sampler forwards. There might be other reasons.

Do you feel like Automatic1111 is a dead-end in the long run? Should we all be putting more effort into learning ComfyUI?

continue revolution · Answer 32 · Mon Nov 20 2023 17:04:26 GMT+0800 (China Standard Time)

Do you feel like Automatic1111 is a dead-end in the long run? Should we all be putting more effort into learning ComfyUI?

I've never had a second having any thoughts like this, and I will almost certainly stick to A1111. I believe that creating a great user experience is our (programmer's) mission, and you, users, should focus more on how to use, and teach us how to actually use our software. A lot of people are much better at using my extension than myself. Designing a user-friendly software is never easy in any UI, but I enjoy that.

A1111 has done his best. A1111 WebUI is easy-to-use, fast and memory efficient. We should not critisize A1111 before having any clear evidence. Dispite the fact that it is tricky to hook some functions, some other functions are designed so good that it can easily fit my need.

That said, we programmers do need money. Working for love is never sustainable, unless we are as rich as Mark Zuckerberg. Mark has already been extremely rich through Facebook, so he open-sourced almost everything about ML in Meta. Sam is not as rich as Mark, so OpenAI becomes CloseAI.

Andray · Answer 33 · Tue Nov 21 2023 00:03:52 GMT+0800 (China Standard Time)

They have updated the algorithm of sampler
https://github.com/huggingface/diffusers/pull/5836/files

shitianfang · Answer 34 · Tue Nov 21 2023 01:54:12 GMT+0800 (China Standard Time)

@light-and-ray How do I use this update? I tested lcm and there are some problems in controlnet inpaint, it doesn't recognize red anymore.

LiquidMind Arts 111 · Answer 35 · Tue Nov 21 2023 09:37:16 GMT+0800 (China Standard Time)

use my extension to use LCM sampler in exaly the way you describe. https://github.com/continue-revolution/sd-webui-animatediff#lcm

Dude, what a BLESSING! Installing Animatediff just added the LCM sampler as well. Also you can totally increase the step count to 10-15 and get even better quality it seems. Thank you!

I am getting an image per second on 3090Ti, with TensorRT (no LCM) it is around 3.5 second per image. But the results are so bad, its not worth using, i have tried 8-15 steps, 2 cfg scale, results come up almost identical to each other in all ways, without deep colors, do you get decent results?

please can you elaborate how ti install TENSOR RT? I have an RTX 2060, do i need to install a driver from NVIDIA, or just the webui tensor rt extension??????

Andray · Answer 36 · Tue Nov 21 2023 10:38:38 GMT+0800 (China Standard Time)

@continue-revolution cold you apply this update in your extension?

They have updated the algorithm of sampler
https://github.com/huggingface/diffusers/pull/5836/files

continue revolution · Answer 37 · Tue Nov 21 2023 10:59:19 GMT+0800 (China Standard Time)

post a feature request in my repo and AT the original author of LCM. He will make the decision.

Aime-ry · Answer 38 · Wed Nov 22 2023 21:02:07 GMT+0800 (China Standard Time)

After trying it out, it's really fantastic. For the Sampling method, select Euler a, Euler, or LCM. Set the Sampling steps to 6-8, and choose the lora option as either lora:lcm-lora-sdv1-5:0.6 or lora:lcm-lora-sdv1-5:0.7. For the CFG Scale, select either 1.2 or 1.5.

LiquidMind Arts 111 · Answer 39 · Fri Nov 24 2023 12:26:48 GMT+0800 (China Standard Time)

After trying it out, it's really fantastic. For the Sampling method, select Euler a, Euler, or LCM. Set the Sampling steps to 6-8, and choose the lora option as either lora:lcm-lora-sdv1-5:0.6 or lora:lcm-lora-sdv1-5:0.7. For the CFG Scale, select either 1.2 or 1.5.

yes but for high quality animation, you can do even 15 steps...... try it with the LCM just like you did..... but ONLY for animatediff animations... not for photos......

DarthBuckeye · Answer 40 · Fri Nov 24 2023 13:23:11 GMT+0800 (China Standard Time)

@LIQUIDMIND111 Why do you say NOT for photos? The images aren't perfect, but after an Ultimate Upscale application, the results look really, really nice. 🤷🏼‍♂️

LiquidMind Arts 111 · Answer 41 · Fri Nov 24 2023 13:26:41 GMT+0800 (China Standard Time)

sorry i meant that for photos you dont need as much steps as for animations, but i have done photos with LCM at 12 steps and look good also, but i meant for photos is enough with 6 steps, but for animatediff i do animations with LCM at 14 steps and look WAY BETTER than 8 steps.... so it gives GOOD result to do MORE steps than what the authors recommend.

…

On Fri, Nov 24, 2023 at 1:23 AM DarthBuckeye ***@***.***> wrote: @LIQUIDMIND111 <https://github.com/LIQUIDMIND111> Why do you say NOT for photos? The images aren't perfect, but after an Ultimate Upscale application, the results look really, really nice. 🤷🏼‍♂️ — Reply to this email directly, view it on GitHub <#13952 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A22MJBASEMLYAXFKM3SDZTLYGAVM5AVCNFSM6AAAAAA7HPKZ5WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGE3DEOJTG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- "It was from the Mind of GOD, the Universe was created." - - -- --- -)( ((---IMAGINATION IS CREATION---)) )(- --- -- - - - -- -- --- -)( ((---liquid[)(]mind---)) )(- --- -- -- -

jonezhang · Answer 42 · Wed Nov 29 2023 16:49:47 GMT+0800 (China Standard Time)

@continue-revolution
It seems that LCM in animatediff is not compatible with regional-prompter. I found in my test that after installing regional-prompter (Did not choose Active), it takes 5 seconds to draw, but after uninstalling regional-prompter, the same parameters only take 1 second to draw.

Andu Potorac · Answer 43 · Tue May 28 2024 02:15:16 GMT+0800 (China Standard Time)

@continue-revolution It seems that LCM in animatediff is not compatible with regional-prompter. I found in my test that after installing regional-prompter (Did not choose Active), it takes 5 seconds to draw, but after uninstalling regional-prompter, the same parameters only take 1 second to draw.

StreamMultiDiffusion solved the issue for regional prompting with LCM - https://github.com/ironjr/StreamMultiDiffusion.

Andu Potorac · Answer 44 · Tue May 28 2024 02:15:37 GMT+0800 (China Standard Time)

I'm awaiting the integration of the LCM sampler into AUTOMATIC1111, While AUTOMATIC1111 is an excellent program, the implementation of new features, such as the LCM sampler and consistency VAE, appears to be sluggish. The consistency VAE is currently only accessible in beta, while comfyui already offers both consistency VAE and LCM sampler support at no time :(

And with LCM you want TinyVAE. It turns 70ms 512x512 4 step generations into 45ms on my 4090. For some reason the way A1111 is fusing the Lora must be different. It only make about a 2.5X speed up instead of the 10X I get in a simple diffusers pipeline.

Care to share a link to TinyVAE?

Glenn 'devalias' Grant · Answer 45 · Tue May 28 2024 13:18:46 GMT+0800 (China Standard Time)

Care to share a link to TinyVAE?

Google'ing TinyVAE doesn't come up with a direct result, but I wonder if they intended to refer to this one:

https://huggingface.co/docs/diffusers/en/api/models/autoencoder_tiny
- Tiny AutoEncoder
  Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in madebyollin/taesd by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion’s VAE that can quickly decode the latents in a StableDiffusionPipeline or StableDiffusionXLPipeline almost instantly.
- https://github.com/madebyollin/taesd
  - Tiny AutoEncoder for Stable Diffusion
    TAESD is very tiny autoencoder which uses the same "latent API" as Stable Diffusion's VAE*. TAESD can decode Stable Diffusion's latents into full-size images at (nearly) zero cost.

Andu Potorac · Answer 46 · Wed May 29 2024 17:47:16 GMT+0800 (China Standard Time)

Care to share a link to TinyVAE?

Google'ing TinyVAE doesn't come up with a direct result, but I wonder if they intended to refer to this one:

https://huggingface.co/docs/diffusers/en/api/models/autoencoder_tiny

Tiny AutoEncoder
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in madebyollin/taesd by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion’s VAE that can quickly decode the latents in a StableDiffusionPipeline or StableDiffusionXLPipeline almost instantly.

https://github.com/madebyollin/taesd

Tiny AutoEncoder for Stable Diffusion
TAESD is very tiny autoencoder which uses the same "latent API" as Stable Diffusion's VAE*. TAESD can decode Stable Diffusion's latents into full-size images at (nearly) zero cost.

Yeah this is what I came across also, though it doesn't use the same name oddly. :)