[Bug]: IP-Adapter ControlNet not working

Question

[Bug]: IP-Adapter ControlNet not working

genialgenteel opened this issue a month ago · comments

genialgenteel commented a month ago

Checklist

The issue exists after disabling all extensions
The issue exists on a clean installation of webui
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

Hello!

I'm not sure if this'll ever be addressed now given active development on Forge for average end-users is being suspended, but I figured I'll ask anyway....

When I try to use the IP-Adapter of ControlNet models, I get errors and the ControlNet is not applied. but no matter which of the IP-Adapter preprocessors and/or models (and LoRA if needed) I use, I get this error and it doesn't work. Other ControlNets like Depth, OpenPose, and Canny still work.

I haven't tried reproducing on a clean install, but I have disabled all my third-party extensions. Weirdly, it worked exactly twice the first time I disabled all third-party extensions. Then when I turned extensions back on, I started getting the error again. I turned everything back off, but the miracle ended and I kept getting the same error anyway with them on or off. Restarted the computer and reopened SD with all 3rd party extensions already off; didn't work. I'm at a loss.

If anyone has any idea what's wrong, I'd appreciate some assistance. Thanks.

Steps to reproduce the problem

Enter control image in ControlNet
Select IP-Adapter
Pick a matching preprocessor/model e.g. (InsightFace+CLIP-H (IPAdapter) & ip-adapter-faceid-plusv2_sd15 [6e14fc1a] + ip-adapter-faceid-plusv2_sd15_lora) or (CLIP-ViT-H (IPAdapter) & ip-adapter-plus-face_sd15 [71693645])
Generate image

What should have happened?

The ControlNet should have been applied.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo - Copy.txt

Console logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f0.0.17v1.8.0rc-latest-276-g29be1da7
Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
Launching Web UI with arguments: --enable-insecure-extension-access --listen --port XXXX --xformers --upcast-sampling --disable-safe-unpickle --theme dark --no-hashing
Total VRAM 8192 MB, total RAM 16069 MB
WARNING:xformers:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xformers version: 0.0.23.post1
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3070 Ti Laptop GPU : native
Hint: your device supports --pin-shared-memory for potential speed improvements.
Hint: your device supports --cuda-malloc for potential speed improvements.
Hint: your device supports --cuda-stream for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  False
Using xformers cross attention
ControlNet preprocessor location: C:\Users\USERNAME\Pictures\sd.webui\webui\models\ControlNetPreprocessor
Loading weights [None] from C:\Users\USERNAME\Pictures\sd.webui\webui\models\Stable-diffusion\SD15\MODELNAME.safetensors
2024-06-22 13:06:14,410 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  XXXXXXXXXXXXXX
model_type EPS
UNet ADM Dimension 0
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
loaded straight to GPU
To load target model BaseModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5422.03662109375
[Memory Management] Model Memory (MB) =  0.00762939453125
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  4398.028991699219
Moving model(s) has taken 0.01 seconds
To load target model SD1ClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  5421.98388671875
[Memory Management] Model Memory (MB) =  454.2076225280762
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  3943.776264190674
Moving model(s) has taken 0.07 seconds

To create a public link, set `share=True` in `launch()`.
Startup time: 16.8s (prepare environment: 4.5s, import torch: 3.8s, import gradio: 0.9s, setup paths: 0.5s, initialize shared: 0.1s, other imports: 0.6s, load scripts: 1.4s, create ui: 0.6s, gradio launch: 4.2s).
Model loaded in 5.0s (load weights from disk: 0.3s, forge instantiate config: 0.2s, forge load real models: 3.8s, calculate empty prompt: 0.7s).
2024-06-22 13:07:26,226 - ControlNet - INFO - ControlNet Input Mode: InputMode.SIMPLE
2024-06-22 13:07:26,226 - ControlNet - INFO - Using preprocessor: InsightFace+CLIP-H (IPAdapter)
2024-06-22 13:07:26,226 - ControlNet - INFO - preprocessor resolution = 512
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: C:\Users\USERNAME\Pictures\sd.webui\webui\models\insightface\models\buffalo_l\w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
Warning torch.load doesn't support weights_only on this pytorch version, loading unsafely.
2024-06-22 13:07:34,574 - ControlNet - INFO - Current ControlNet IPAdapterPatcher: C:\Users\USERNAME\Pictures\sd.webui\webui\models\ControlNet\ip-adapter-faceid-plusv2_sd15.bin
NeverOOM Enabled for UNet (always maximize offload)
NeverOOM Enabled for VAE (always tiled)
VARM State Changed To NO_VRAM
[LORA] Loaded C:\Users\USERNAME\Pictures\sd.webui\webui\models\Lora\SD15\LORANAME.safetensors for BaseModel-UNet with 192 keys at weight 0.9 (skipped 0 keys)
[LORA] Loaded C:\Users\USERNAME\Pictures\sd.webui\webui\models\Lora\SD15\LORANAME.safetensors for BaseModel-CLIP with 72 keys at weight 0.9 (skipped 0 keys)
To load target model SD1ClipModel
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] SYNC Loader Disabled for  EmbeddingsWithFixes(
  (wrapped): Embedding(49408, 768)
)
[Memory Management] SYNC Loader Disabled for  Embedding(49408, 768)
[Memory Management] SYNC Loader Disabled for  Embedding(77, 768)
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  162.2314453125
[Memory Management] Parameters Loaded to GPU (MB) =  434.4755859375
Moving model(s) has taken 0.17 seconds
token_merging_ratio = 0.1
C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\insightface\utils\transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
INFO: InsightFace detection resolution lowered to (384, 384).
To load target model CLIPVisionModelWithProjection
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] SYNC Loader Disabled for  Embedding(257, 1280)
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  1204.9609375
[Memory Management] Parameters Loaded to GPU (MB) =  0.62744140625
Moving model(s) has taken 0.04 seconds
*** Error running process_before_every_sampling: C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_controlnet\scripts\controlnet.py
    Traceback (most recent call last):
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\modules\scripts.py", line 835, in process_before_every_sampling
        script.process_before_every_sampling(p, *script_args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_controlnet\scripts\controlnet.py", line 555, in process_before_every_sampling
        self.process_unit_before_every_sampling(p, unit, self.current_params[i], *args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_controlnet\scripts\controlnet.py", line 501, in process_unit_before_every_sampling
        params.model.process_before_every_sampling(p, cond, mask, *args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_ipadapter\scripts\forge_ipadapter.py", line 147, in process_before_every_sampling
        unet = opIPAdapterApply(
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\extensions-builtin\sd_forge_ipadapter\lib_ipadapter\IPAdapterPlus.py", line 690, in apply_ipadapter
        clip_embed = clip_vision.encode_image(image).penultimate_hidden_states
      File "C:\Users\USERNAME\Pictures\sd.webui\webui\ldm_patched\modules\clip_vision.py", line 70, in encode_image
        outputs = self.model(pixel_values=pixel_values, output_hidden_states=True)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\transformers\models\clip\modeling_clip.py", line 1310, in forward
        vision_outputs = self.vision_model(
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\transformers\models\clip\modeling_clip.py", line 865, in forward
        hidden_states = self.embeddings(pixel_values)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\Users\USERNAME\Pictures\sd.webui\system\python\lib\site-packages\transformers\models\clip\modeling_clip.py", line 199, in forward
        embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

---
To load target model BaseModel
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  1639.406135559082
[Memory Management] Parameters Loaded to GPU (MB) =  0.0
Moving model(s) has taken 0.41 seconds
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  3.00it/s]
To load target model AutoencoderKL█████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00,  3.62it/s]
Begin to load 1 model
[Memory Management] Requested SYNC Preserved Memory (MB) =  0.0
[Memory Management] Parameters Loaded to SYNC Stream (MB) =  159.55708122253418
[Memory Management] Parameters Loaded to GPU (MB) =  0.0
Moving model(s) has taken 0.02 seconds
VAE tiled decode: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00,  8.54it/s]
Total progress: 100%|██████

Additional information

Input image:

The two times IP-Adapter worked when I tested it (512x512 with no upscaling, so they look pretty bad lol):

Then it stopped working and the outputs reflect that (clearly the IP-Adapter was not applied):