salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

Peter-D-James opened this issue · comments

commented

image
When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this

Screen Shot 2023-07-09 at 9 38 26 AM I commented out the line that causes error and uncommented the line below as a temporary solution
commented

Screen Shot 2023-07-09 at 9 38 26 AM I commented out the line that causes error and uncommented the line below as a temporary solution

Yeah, that's what I did. But it's strange that I can use these two methods in the official demo Colab notebook, but I can't use beam search when I write the code myself.

commented

Screen Shot 2023-07-09 at 9 38 26 AM I commented out the line that causes error and uncommented the line below as a temporary solution

And I succeeded after downgrading the transformers version to 4.16.0. But it seems that I cannot import AutoProcessor when using this version of transformers.

same issue

same issue, waiting for fixing

same issue

I suggest opening an issue with kohya_ss in his repo as he is the one maintaining the behind the scene code. I only wrap his script in a GUI: https://github.com/kohya-ss/sd-scripts

image When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this

Dude, I also ran into this problem when I changed sample = False to sample = True like this:
image

This will allow for successful execution. But I wonder why beam search doesn't work and only nuclear search can be used

image When I use beam search to generate caption for a picture on Colab, this error occurs. My transformers version is 4.25.1. And it works when I use nucleus sampling. How should I solve this

Dude, I also ran into this problem when I changed sample = False to sample = True like this: image

This will allow for successful execution. But I wonder why beam search doesn't work and only nuclear search can be used
image

I have same problem and using transformers=4.16 did not help

I found that when commented out the line in /model/blip.py line 131 fix the problem:

image

Don't know why, hope someone can provide the detail explanation down the hood.

I found that when commented out the line in /model/blip.py line 131 fix the problem:

image

Don't know why, hope someone can provide the detail explanation down the hood.

When you comment out this line, the dimension 9 will be 3, so it can run. But this is not the beam search since you just keep one result!

in order to solve this problem you need to set num_beams=1 not 3. (for instance in blip_vqa.py line 92)

I solved this problem, if the transformers is 4.16.0, everything is ok. But I used the transformers4.36.2, in this case, in the 818 lines of the generation_utils.py of transformers:
you need to comment out the _expand_dict_for_generation function where the encoder_hidden_state was multiplied again with the beam search num!
Finally solved!!!

You should submit a PR to Koby’s in his sd-scripts repo to fix it for good.

Updating to 1.0.2 fixed it for me.

Commenting out these two lines may work:

BLIP/models/blip.py

Lines 131 to 132 in 3a29b74

if not sample:
image_embeds = image_embeds.repeat_interleave(num_beams,dim=0)

EDIT: After commenting I noticed yenlianglai had already written.

The recent transformers seems to do repeat_interleave automatically in _expand_dict_for_generation. This fix huggingface/transformers#21624 seems to cause this issue.