HaozheZhao / MIC

MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sp_token=32110

xie-qiang opened this issue · comments

Hello, I found that the SP_TOKEN should be set to 32110 in Demo, otherwise the Image Token cannot be replaced, resulting in poor results, thank you!

outputs = model.generate(
        pixel_values = inputs['pixel_values'],
        input_ids = inputs['input_ids'],
        attention_mask = inputs['attention_mask'],
        img_mask = inputs['img_mask'],
        do_sample=False,
        max_length=50,
        min_length=1,
        set_min_padding_size =False,
        sp_token = 32110
)

Hello, I tried the method you mentioned, but encountered an error. Do you have any suggestions? Thank you very much!

Here is the complete error information.

shape mismatch leads to truncate. insert embedding tensor of shape torch.Size([96, 4096]) cannot be broadcast to replace placeholder of shape torch.Size([0, 4096])

{
	"name": "RuntimeError",
	"message": "torch.cat(): expected a non-empty list of Tensors",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[10], line 17
     14 inputs['pixel_values'] = inputs['pixel_values'].unsqueeze(0)
     16 inputs = inputs.to('cuda:0')
---> 17 outputs = model.generate(
     18         pixel_values = inputs['pixel_values'],
     19         input_ids = inputs['input_ids'],
     20         attention_mask = inputs['attention_mask'],
     21         img_mask = inputs['img_mask'],
     22         do_sample=False,
     23         max_length=50,
     24         min_length=1,
     25         set_min_padding_size =False,
     26         sp_token = 32110
     27 )
     28 generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0].strip()
     29 print(generated_text)

File ~/anaconda3/envs/mmicl/lib/python3.8/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/hjz/harmful_meme_detection/mmicl/model/instructblip/modeling_instructblip.py:2129, in InstructBlipForConditionalGeneration.generate(self, pixel_values, qformer_input_ids, qformer_attention_mask, input_ids, attention_mask, img_mask, set_min_padding_size, sp_token, **generate_kwargs)
   2126         index+= i_count*img_token_szie
   2127     img_idx +=1
-> 2129 insert_embeds = torch.concat(insert_embeds_list, dim=0)
   2130 try:
   2131     inputs_embeds[image_embeds_index] = insert_embeds

RuntimeError: torch.cat(): expected a non-empty list of Tensors"
}