EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Home Page:https://lmms-lab.github.io/lmms-eval-blog/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

quetions about the evals

CrossLee1 opened this issue · comments

Thanks for your great work!

I have some questions about the evaluation results.

  1. TextVQA, results of llava-1.5-13B get score of 61.3 in the paper, but in the sheet https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit#gid=0, the result is only 48.73, why?
  2. After executing the command as README, the generated submission file of mmbench seems wrong. When I upload it to the evaluation server, I got the logs "Your excel file should have a column named A, please double check and submit again"

hope to get your response, thanks ~

I think it's because llava reports the test result meanwhile we are reporting val split.

Running textvqa will get both val metric result and a submission file for test split.

And users may submit the file to https://eval.ai/web/challenges/challenge-page/874/ to get the results.

Maybe @pufanyi could address this more clearly.

Hello! Thank you for your interest in our work! Regarding question 1, as referenced in the LLaVA evaluation code here, the OCR token was utilized as a prompt for evaluation.

However, in our evaluation of LLaVA, we did not incorporate the OCR token:

To achieve results consistent with the LLaVA paper, you can set this to true.

@Luodian @pufanyi thanks for your reply, as for the question 2, do you have any ideas?

As for MMBench, We are updating it in a new PR #7 .

But the we are waiting the run to finish to get a eval result and then decide to merge it. You can also help to check if it's correct.