MILVLG / imp

a family of highly capabale yet efficient large multimodal models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model Evaluation

BeachWang opened this issue · comments

Hi,

I have trained imp with lora. However, it does not process the reference when I run the evaluation scripts.
Following is the output when I eval pope.
截屏2024-02-17 下午12 17 51

Hi, thanks for your interest. You can add --model-base microsoft/phi-2 \ to the following code in pope.sh. Then it might work and start to run the scripts.

imp/scripts/eval/pope.sh

Lines 22 to 29 in 48bdc60

--model-path $MODEL_PATH \
--question-file ./playground/data/eval/pope/llava_pope_test.jsonl \
--image-folder ./playground/data/eval/pope/val2014 \
--answers-file ./playground/data/eval/pope/answers/$SPLIT/$CKPT/${CHUNKS}_${IDX}.jsonl \
--num-chunks $CHUNKS \
--chunk-idx $IDX \
--temperature 0 \
--conv-mode phi2 &

And we will fix this problem as soon as possible. Let me know if there's any other questions.

Hi, I have added the --model-base. But there is another problem.
截屏2024-02-19 下午12 06 50

Hi @BeachWang ! This bug is also caused by the update of phi-2 repo itself. More specifically, the phi-2 team has changed the parameter names in model weights, i.e., the keys in .safetensors files. Thus there is a severe mismatch between current model weights of phi-2 and our codebase, which leading this loading error.

Our code cannot load "new version" of phi-2 unless we change our code to the new model definition of phi-2. But in that case, old checkpoints will also be unavailable. So we decide to maintain our code and recommend users to download old version of phi-2 repo.

Specifically, you can run following python script to download phi-2 to a local folder:

import os
# os.environ["https_proxy"] = "http://xxx.xxx.xxx.xxx:xx"  # in case you need proxy to access Huggingface Hub
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="microsoft/phi-2", 
    revision="d3186761bf5c4409f7679359284066c25ab668ee",
    local_dir='checkpoints/base/phi-2',
    local_dir_use_symlinks=False
)

And use checkpoints/base/phi-2 as base model argument for future experiments.

Hi @ParadoxZW . Do we need to roll back the code for #7 if we use the old version of phi-2, since the codes for training have been made compatible to the new version of phi-2.

No need to roll back. The code should always work for old version of phi-2 for both training and evaluation.

See our latest update :) @BeachWang

Hi,
I have succeed in reproducing your work. But I find there are some problems in the eval of pope and sqa. Specifically, it should be $EVAL_CKPT but not $CKPT in line 27 and 47 in pope.sh. As for sqa, I got the following error when running sqa.sh.
截屏2024-02-26 下午2 33 44
The reason seems be that imp use model_vqa_loader to eval sqa, but LLaVA use model_vqa_science actually.

Hi, thanks for your reminder.
We will fix these bugs in next update.

In scienceQA, you should use model_vqa_science for llava_test_CQM-A.json, and we rewriting a question file scienceqa_multi.jsonl which follows multiple-choice's prompt in LLaVA's Evaluation.md and fits to model_vqa_loader. You can see the detail in next update too.