ZrrSkywalker / MathVerse

Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InternVL benchmark result

nhhviet98 opened this issue Β· comments

First, I thank you very much for your contribution. πŸ’― πŸ’― πŸ’―

In MathVerse, You have proven that most MLLMs solve problems based on "Text Redundancy".

I saw that, in InternVL they scale up the vision encoder to reduce the gap between Visual and Textual information. And it's also achieved Top 1 in MathVista.

Can you provide the benchmark results of InternVL on the MathVerse dataset? I think it will add useful information to your hypothesis.

Reference papers:

https://arxiv.org/pdf/2312.14238.pdf

Thanks for your interest! Sure, InternVL is an insightful work with superior performance across different benchmarks. We will update the results on the leaderboard soon.