OpenGVLab / ChartAst

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discrepancy in Data Count Between Paper and Huggingface Dataset

nth2000 opened this issue · comments

First of all, thank you for your outstanding work!

I noticed that the chart_upload.json file in the Huggingface dataset contains 2,633,068 entries. However, Table 1 in your paper mentions a total of 39M data samples. So I'm wondering which part of the data has not yet been released and are there any plans to release the remaining data samples?

Many thanks!

Hello, We have not uploaded all image-table pairs, beacuse it is just generated by some APIs (matploblib,etc.). It is easy to generate. Besides, We do not upload the open-source dataset.
Therefore, we upload the most important part of our dataset: MathQA,ReferQA(which use COT-answer) and Arxiv dataset.

Also we find that we do not use all data for ChartAst-S, it can be archievd similar performance using this and other open-source dataset.