OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not able to reproduce results of llama-adapter-v2

dmlpt opened this issue · comments

Hi,

I am trying to reproduce the results of llama-adapter v2. I am finetuning the model with "alpaca_gpt4_data" and "llava_instruct_150k" datasets and using the settings from https://github.com/OpenGVLab/LLaMA-Adapter/blob/a50befee3fdde8a08ca346b2ec70407e59ff6536/llama_adapter_v2_multimodal7b/exps/finetune.sh

I used the pre-trained model from https://huggingface.co/Cxxs/ImageBind-LLM/resolve/main/7B-pretrained.pth for finetuning.

When I evaluated the model using https://github.com/OpenGVLab/LLaMA-Adapter/blob/a50befee3fdde8a08ca346b2ec70407e59ff6536/llama_adapter_v2_multimodal7b/util/evaluate_mme.py [setting all three w_bias, w_lora, w_new_gate to FALSE, The load_state_dict does not show any missing keys after loading], I get the below result (random chance):

=========== Perception ===========
total score: 497.44607843137254

 existence  score: 50.0
 count  score: 50.0
 position  score: 50.0
 color  score: 50.0
 posters  score: 66.66666666666667
 celebrity  score: 28.52941176470588
 scene  score: 50.0
 landmark  score: 50.0
 artwork  score: 52.25
 OCR  score: 50.0

=========== Cognition ===========
total score: 248.57142857142858

 commonsense_reasoning  score: 53.57142857142858
 numerical_calculation  score: 50.0
 text_translation  score: 95.0
 code_reasoning  score: 50.0

Am I making any mistake in evaluating the model?

Note: I am getting reported results when I use the model downloaded from https://github.com/OpenGVLab/LLaMA-Adapter/releases/download/v.2.1.0/427dbc27bf62a3ef7a24ffd3ed2c3162_LORA-BIAS-7B-v21.pth

Thanks in Advance!

Can you share your training log?