Strange Model Loading Issue: Inconsistency with Vision Tower Parameters

Question

Strange Model Loading Issue: Inconsistency with Vision Tower Parameters

shidingz opened this issue 3 months ago · comments

I found something strange when loading the model. It seems that you have released the vision_tower during training, but when loading the vision_tower, you did not load the gradient-updating parameters, instead, you loaded the original vision tower (you loaded the model through the name mm_vision_tower). Can you please explain why this is happening?

Feng Li · Answer 1 · Tue Jul 02 2024 05:50:49 GMT+0800 (China Standard Time)

Which model are you using?

Zihao Zheng · Answer 2 · Thu Jul 04 2024 15:00:05 GMT+0800 (China Standard Time)

@FengLi-ust I'm using llama3-llava-next-8b and almost all released models have same issue:

the released config.json loads vision_tower from origin openai repo:

"mm_vision_tower": "openai/clip-vit-large-patch14-336",

Does that cause some difference to well-trained vision_tower? Change the path of vision_tower to identify the problem:

"mm_vision_tower": "../pretrained_models/llama3-llava-next-8b",

model_path = "../pretrained_models/llama3-llava-next-8b"
model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, 
                                              device_map=device_map,
                                              attn_implementation='flash_attention_2', config=llava_cfg, )

then saw some warnings:

Some weights of CLIPVisionModel were not initialized from the model checkpoint at ../pretrained_models/llama3-llava-next-8b and are newly initialized: ['vision_model.embeddings.class_embedding', 
'vision_model.embeddings.patch_embedding.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 
'vision_model.en
...