Model Evaluation
BeachWang opened this issue · comments
Hi, thanks for your interest. You can add --model-base microsoft/phi-2 \
to the following code in pope.sh
. Then it might work and start to run the scripts.
Lines 22 to 29 in 48bdc60
And we will fix this problem as soon as possible. Let me know if there's any other questions.
Hi @BeachWang ! This bug is also caused by the update of phi-2 repo itself. More specifically, the phi-2 team has changed the parameter names in model weights, i.e., the keys in .safetensors
files. Thus there is a severe mismatch between current model weights of phi-2 and our codebase, which leading this loading error.
Our code cannot load "new version" of phi-2 unless we change our code to the new model definition of phi-2. But in that case, old checkpoints will also be unavailable. So we decide to maintain our code and recommend users to download old version of phi-2 repo.
Specifically, you can run following python script to download phi-2 to a local folder:
import os
# os.environ["https_proxy"] = "http://xxx.xxx.xxx.xxx:xx" # in case you need proxy to access Huggingface Hub
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="microsoft/phi-2",
revision="d3186761bf5c4409f7679359284066c25ab668ee",
local_dir='checkpoints/base/phi-2',
local_dir_use_symlinks=False
)
And use checkpoints/base/phi-2
as base model argument for future experiments.
Hi @ParadoxZW . Do we need to roll back the code for #7 if we use the old version of phi-2, since the codes for training have been made compatible to the new version of phi-2.
No need to roll back. The code should always work for old version of phi-2 for both training and evaluation.
See our latest update :) @BeachWang
Hi,
I have succeed in reproducing your work. But I find there are some problems in the eval of pope and sqa. Specifically, it should be $EVAL_CKPT
but not $CKPT
in line 27 and 47 in pope.sh
. As for sqa, I got the following error when running sqa.sh
.
The reason seems be that imp use model_vqa_loader
to eval sqa, but LLaVA use model_vqa_science
actually.
Hi, thanks for your reminder.
We will fix these bugs in next update.
In scienceQA, you should use model_vqa_science
for llava_test_CQM-A.json
, and we rewriting a question file scienceqa_multi.jsonl which follows multiple-choice's prompt in LLaVA's Evaluation.md and fits to model_vqa_loader
. You can see the detail in next update too.