Llava1.6 Mistral ScienceQA Performance

Question

Llava1.6 Mistral ScienceQA Performance

daniel-z-kaplan opened this issue 2 months ago · comments

Hello,

If we view the chart provided, Mistral-7b achieves a score of .23/100 on ScienceQAFull.
I am able to replicate this, but this is obviously very strange - the other comparison models get scores around 73.

Kaichen Zhang - NTU · Answer 1 · Tue Mar 26 2024 09:12:43 GMT+0800 (China Standard Time)

Hi, @daniel-z-kaplan

Based on our logs, it is likely because the mistral-7b tends to generate an empty space and causing the exact match give a zero score. This is same for gqa and ai2d.

In our next release this will be fixed and new results will be updated.

JingfanChen · Answer 2 · Sun Apr 28 2024 21:03:29 GMT+0800 (China Standard Time)

Any updates? I met the same problem that the model liuhaotian/llava-v1.6-vicuna-7b always generates empty string.