damian0815 / compel

A prompting enhancement library for transformers-type text embedding systems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

still get The size of tensor a (154) must match the size of tensor b (77) after using the code in readme

Deaddawn opened this issue · comments

commented

Hi, there. My code is following:

**import torch
from diffusers import StableDiffusionPipeline
from compel import Compel

pipeline = StableDiffusionPipeline.from_pretrained("stabilityaistable-diffusion-2-1").to('cuda')
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder, truncate_long_prompts=False)
prompt = "a cat playing with a ball++ in the forest, amazing, exquisite, stunning, masterpiece, skilled, powerful, incredible, amazing, trending on gregstation, greg, greggy, greggs greggson, greggy mcgregface,incredible, amazing, trending on gregstation, greg, greggy, greggs greggson, greggy mcgregface,incredible, amazing, trending on gregstation, greg, greggy, greggs greggson, greggy mcgregface"

conditioning = compel.build_conditioning_tensor(prompt)
negative_prompt = "s"
negative_conditioning = compel.build_conditioning_tensor(negative_prompt)
[conditioning, negative_conditioning] = compel.pad_conditioning_tensors_to_same_length([conditioning, negative_conditioning])
print(conditioning.shape)
print(negative_conditioning.shape)
images = pipeline(prompt_embeds=conditioning,negative_embeds=negative_conditioning,num_inference_steps=50).images[0]**

still get error:

**File "/root/miniconda/envs/difflate/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 691, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
File "/root/miniconda/envs/difflate/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/root/miniconda/envs/difflate/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 219, in forward
embeddings = inputs_embeds + position_embeddings
RuntimeError: The size of tensor a (154) must match the size of tensor b (77) at non-singleton dimension 1

env: diffusers=0.25.0.dev0 compel=2.0.2

urgh. sorry about that. i'll take a look.

I found the same issue, seems to be related to prompt words.

I have a similar issue using SDXL. Probably a subtle off by one error.
Killer prompt: "bla bla bla bla bla bla bla bla bla bla bla bla, blabla bla, bla, bla, (bla bla bla bla)0.75, bla, 50mm, bla 4k bla bla 4k bla bla bla bla bla bla bla bla bla bla bla, 35mm bla, blabla, blabla, bla, blablabla, lomography"

( bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|,|bla·bla|bla|,|bla|,|bla|,| )1 ( bla|bla|bla|bla| )0.75 ( ,|bla|,|5|0|mm|,|bla|4|k|bla|bla|4|k|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|bla|,|3|5|mm|bla|,|bla·bla|,|bla·bla|,|bla|,|bla·bla·bla|,|lom·ography| )1

RuntimeError: The size of tensor a (78) must match the size of tensor b (77) at non-singleton dimension 1

A bit stabbing in the dark but I think I can narrow it down a bit to EmbeddingsProvider.get_pooled_embeddings. From there the full raw prompt is fed to the tokenizer including emphasis like "(bla)0.75". Does anyone know if this is supposed to happen?

In some cases the textual_inversion_manager pipe.tokenizer.decode, pipe.tokenizer.encode roundtrip changes token_ids count with no textual_inversions present. A bit suspicious as get_token_ids for pooling is explicitly run with padding and truncation_override set to True. For some prompts this results in a length that is not divisable by 75/77wbe. Not sure if this is the culprit but it seems noteworthy.

Edit: Deactivating self.textual_inversion_manager.expand_textual_inversion_token_ids_if_necessary(token_ids) does make the problematic prompt above pass without throwing up.