still get The size of tensor a (154) must match the size of tensor b (77) after using the code in readme

Question

still get The size of tensor a (154) must match the size of tensor b (77) after using the code in readme

Deaddawn opened this issue 9 months ago · comments

Hi, there. My code is following:

**import torch
from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("stabilityaistable-diffusion-2-1").to('cuda')
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder, truncate_long_prompts=False)
prompt = "a cat playing with a ball++ in the forest, amazing, exquisite, stunning, masterpiece, skilled, powerful, incredible, amazing, trending on gregstation, greg, greggy, greggs greggson, greggy mcgregface,incredible, amazing, trending on gregstation, greg, greggy, greggs greggson, greggy mcgregface,incredible, amazing, trending on gregstation, greg, greggy, greggs greggson, greggy mcgregface"

conditioning = compel.build_conditioning_tensor(prompt)
negative_prompt = "s"
negative_conditioning = compel.build_conditioning_tensor(negative_prompt)
[conditioning, negative_conditioning] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_conditioning])
print(conditioning.shape)
print(negative_conditioning.shape)
images = pipeline(prompt_embeds=conditioning,negative_embeds=negative_conditioning,num_inference_steps=50).images[0]**

still get error:

**File "/root/miniconda/envs/difflate/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 691, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
File "/root/miniconda/envs/difflate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/root/miniconda/envs/difflate/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 219, in forward
embeddings = inputs_embeds + position_embeddings
RuntimeError: The size of tensor a (154) must match the size of tensor b (77) at non-singleton dimension 1

env: diffusers=0.25.0.dev0 compel=2.0.2

Damian Stewart · Answer 1 · Fri Jan 12 2024 06:05:43 GMT+0800 (China Standard Time)

urgh. sorry about that. i'll take a look.

qinyuxinq · Answer 2 · Tue Feb 06 2024 16:48:41 GMT+0800 (China Standard Time)

I found the same issue, seems to be related to prompt words.

spezialspezial · Answer 3 · Thu Feb 08 2024 19:07:40 GMT+0800 (China Standard Time)

I have a similar issue using SDXL. Probably a subtle off by one error.
Killer prompt: "bla bla bla bla bla bla bla bla bla bla bla bla, blabla bla, bla, bla, (bla bla bla bla)0.75, bla, 50mm, bla 4k bla bla 4k bla bla bla bla bla bla bla bla bla bla bla, 35mm bla, blabla, blabla, bla, blablabla, lomography"

( bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|,|bla·bla|bla|,|bla|,|bla|,| )1 ( bla|bla|bla|bla| )0.75 ( ,|bla|,|5|0|mm|,|bla|4|k|bla|bla|4|k|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|,|3|5|mm|bla|,|bla·bla|,|bla·bla|,|bla|,|bla·bla·bla|,|lom·ography| )1

RuntimeError: The size of tensor a (78) must match the size of tensor b (77) at non-singleton dimension 1

spezialspezial · Answer 4 · Sat Feb 10 2024 20:57:26 GMT+0800 (China Standard Time)

A bit stabbing in the dark but I think I can narrow it down a bit to EmbeddingsProvider.get_pooled_embeddings. From there the full raw prompt is fed to the tokenizer including emphasis like "(bla)0.75". Does anyone know if this is supposed to happen?

spezialspezial · Answer 5 · Sat Feb 10 2024 23:29:25 GMT+0800 (China Standard Time)

In some cases the textual_inversion_manager pipe.tokenizer.decode, pipe.tokenizer.encode roundtrip changes token_ids count with no textual_inversions present. A bit suspicious as get_token_ids for pooling is explicitly run with padding and truncation_override set to True. For some prompts this results in a length that is not divisable by 75/77wbe. Not sure if this is the culprit but it seems noteworthy.

Edit: Deactivating self.textual_inversion_manager.expand_textual_inversion_token_ids_if_necessary(token_ids) does make the problematic prompt above pass without throwing up.