[Bug]: Move model much longer with loras

Question

[Bug]: Move model much longer with loras

Super-zapper opened this issue 2 months ago · comments

Checklist

The issue exists after disabling all extensions
The issue exists on a clean installation of webui
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

I have 3060 6gbvram. Generation of 1 image at SDXL takes about 19 sec. But if I add 2 LORAs it takes about 35 seconds and almost 20 of them takes a model moving. I am not sure if that is wrong or not, but I thought this increment is disproportional.

Steps to reproduce the problem

Just run txt2img with and without LORAs added

What should have happened?

I believe difference in generation time should be not so significant, about 2 times longer if I add 2 loras.

What browsers do you use to access the UI ?

Google Chrome, Brave

Sysinfo

sysinfo-2024-06-03-04-54.json

Console logs

Already up to date.
venv "E:\Forge SD\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f0.0.17v1.8.0rc-latest-276-g29be1da7
Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
CUDA 12.1
Launching Web UI with arguments: --api --xformers
Total VRAM 6144 MB, total RAM 16147 MB
WARNING:xformers:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xformers version: 0.0.23.post1
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 Laptop GPU : native
Hint: your device supports --pin-shared-memory for potential speed improvements.
Hint: your device supports --cuda-malloc for potential speed improvements.
Hint: your device supports --cuda-stream for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  False
Using xformers cross attention
ControlNet preprocessor location: E:\Forge SD\models\ControlNetPreprocessor
Using sqlite file: E:\Forge SD\extensions\sd-webui-agent-scheduler\task_scheduler.sqlite3
01:55:22 - ReActor - STATUS - Running v0.7.0-b7 on Device: CUDA
Loading weights [67ab2fd8ec] from E:\Forge SD\models\Stable-diffusion\ponyDiffusionV6XL_v6StartWithThisOne.safetensors
2024-06-03 01:55:22,653 - ControlNet - INFO - ControlNet UI callback registered.
model_type EPS
UNet ADM Dimension 2816
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 23.2s (prepare environment: 6.4s, import torch: 5.6s, import gradio: 1.1s, setup paths: 1.0s, initialize shared: 0.2s, other imports: 0.9s, load scripts: 4.2s, create ui: 1.0s, gradio launch: 0.4s, add APIs: 0.8s, app_started_callback: 1.4s).
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection'}
To load target model SDXLClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5120.6982421875
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  1952.3435440063477
Moving model(s) has taken 0.82 seconds
Model loaded in 14.9s (load weights from disk: 0.9s, forge load real models: 12.5s, load textual inversion embeddings: 0.1s, calculate empty prompt: 1.2s).
To load target model SDXL
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5084.38427734375
[Memory Management] Model Memory (MB) =  4897.086494445801
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  -836.7022171020508
[Memory Management] Requested SYNC Preserved Memory (MB) =  3123.3725204467773
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  1773.6767654418945
[Memory Management] Parameters Loaded to GPU (MB) =  3123.37158203125
Moving model(s) has taken 3.02 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:16<00:00,  1.08s/it]
To load target model AutoencoderKL█████████████████████████████████████████████████████| 15/15 [00:13<00:00,  1.00s/it]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5055.88427734375
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  3872.327196121216
Moving model(s) has taken 1.15 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 15/15 [00:16<00:00,  1.09s/it]
[LORA] Loaded E:\Forge SD\models\Lora\xl_more_art-full_v1.safetensors for SDXL-UNet with 788 keys at weight 0.8 (skipped 0 keys)
[LORA] Loaded E:\Forge SD\models\Lora\Smooth Anime 2 Style SDXL_LoRA_Pony Diffusion V6 XL.safetensors for SDXL-UNet with 722 keys at weight 1.0 (skipped 0 keys)
[LORA] Loaded E:\Forge SD\models\Lora\Smooth Anime 2 Style SDXL_LoRA_Pony Diffusion V6 XL.safetensors for SDXL-CLIP with 264 keys at weight 1.0 (skipped 0 keys)
To load target model SDXLClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  4884.6904296875
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  1716.3357315063477
Moving model(s) has taken 0.86 seconds
To load target model SDXL
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5046.9013671875
[Memory Management] Model Memory (MB) =  4897.086494445801
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  -874.1851272583008
[Memory Management] Requested SYNC Preserved Memory (MB) =  3094.5395126342773
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  1802.6171875
[Memory Management] Parameters Loaded to GPU (MB) =  3094.4311599731445
Moving model(s) has taken 70.16 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:16<00:00,  1.09s/it]
To load target model AutoencoderKL█████████████████████████████████████████████████████| 15/15 [00:14<00:00,  1.00it/s]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5038.4013671875
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  3854.844285964966
Moving model(s) has taken 0.52 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 15/15 [00:15<00:00,  1.07s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 15/15 [00:15<00:00,  1.00it/s]

Additional information

In console log you will see first generation without any loras and move model takes 3 sec (that is first attempt, next will take 1 sec). Second generation is with 2 loras takes 70 sec to move model (it is first generation so it takes much longer, next ones will take 19 sec minimum)

Samantha Mason · Answer 1 · Tue Jun 11 2024 13:38:45 GMT+0800 (China Standard Time)

in your settings under extra networks, how many loras do you have set to be cached in memory?

Super-zapper · Answer 2 · Tue Jun 11 2024 14:43:28 GMT+0800 (China Standard Time)

Well, it was 0, but now I tried 2 and this does not help. Also I tried to increase maximum number of checkpoints to be loaded and did not helped also.

unspokethetheme · Answer 3 · Tue Jun 18 2024 17:24:53 GMT+0800 (China Standard Time)

#693 (comment)