salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question or bug in blip_pretrain.py

LiGuo12 opened this issue · comments

Between lines 54-75, after "self.text_encoder = BertModel.from_pretrained('bert-base-uncased',config=encoder_config, add_pooling_layer=False)", where "encoder_config" is 'configs/bert_config.json' , the vocal_size is 31090. Next, after "self.text_encoder.resize_token_embeddings(len(self.tokenizer))", the vocal_size changes to 31092. However, "self.text_encoder_m" does not resize the vocal_size to 31092, and its vocal_size is still 31090. Thus, "self.text_encoder" and "self.text_encoder_m" have problems with "self.copy_params()".

Is it a bug? I think there should be a ""self.text_encoder_m.resize_token_embeddings(len(self.tokenizer))" after the line 67.