EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Home Page:https://lmms-lab.github.io/lmms-eval-blog/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Llava1.6 Mistral ScienceQA Performance

daniel-z-kaplan opened this issue · comments

Hello,

If we view the chart provided, Mistral-7b achieves a score of .23/100 on ScienceQAFull.
I am able to replicate this, but this is obviously very strange - the other comparison models get scores around 73.

Hi, @daniel-z-kaplan

Based on our logs, it is likely because the mistral-7b tends to generate an empty space and causing the exact match give a zero score. This is same for gqa and ai2d.

In our next release this will be fixed and new results will be updated.

Any updates? I met the same problem that the model liuhaotian/llava-v1.6-vicuna-7b always generates empty string.