Special Hyperparameter value or Checkpoints are needed?

Question

Special Hyperparameter value or Checkpoints are needed?

yejipark-m opened this issue 2 months ago · comments

Hello, thanks for sharing your nice work.

I've encountered a problem when trying to reproduce the reported results, even after considering for the standard deviations. I'm currently utilizing the model checkpoints from Hugging Face.

model_paths[instructblip]="~/.cache/huggingface/hub/models--lmsys--vicuna-7b-v1.1"
model_paths[llava]="liuhaotian/llava-v1.5-7b"
model_paths[qwenvl]="Qwen/Qwen-VL"

I've kept the hyperparameters at their default settings:

python3 eval/object_hallucination_vqa_${model}.py --model-path ${model_paths[$model]} --question-file data/POPE/aokvqa/aokvqa_pope_${type}.json --image-folder data/MSCOCO/val2014 --answers-file ./output/${model}/aokvqa_pope_${type}_vcd.jsonl --use_cd

parser.add_argument("--noise_step", type=int, default=500)
parser.add_argument("--use_cd", action='store_true', default=False)
parser.add_argument("--cd_alpha", type=float, default=1)
parser.add_argument("--cd_beta", type=float, default=0.1)
parser.add_argument("--seed", type=int, default=42)

However, with these checkpoints and hyperparameters, the output numbers are significantly lower than your reported performance, particularly on the GQA dataset and A-OKVQA dataset. The results without VCD are somewhat similar to the reported numbers, but using VCD seems to be where the issue lies.

Could you specify if there are any particular checkpoints you used for each model? Sharing the exact setup or recipe for correct inference would be greatly appreciated.

Sicong · Answer 1 · Sun Apr 07 2024 20:33:53 GMT+0800 (China Standard Time)

Hi, thanks for your interest. Please refer to "Implementation Details" in Section 4.1 and Appendix A. for our hyperparameter settings for specific experiments.

Yeji Park · Answer 2 · Mon Apr 08 2024 09:37:32 GMT+0800 (China Standard Time)

Thanks for your reply. I had misused the noise step when evaluating POPE.
I'll close the issue.