Does the pertrainging Corpus contain the test and val images in ChartQA?

Question

Does the pertrainging Corpus contain the test and val images in ChartQA?

Evanwu1125 opened this issue 5 months ago · comments

I've benchmarked Unichart on some personalized questions with ChartQA pictures recently, but I find that it did not perform as well as you claimed in the paper.

Evanwu · Answer 1 · Mon Apr 22 2024 21:05:27 GMT+0800 (China Standard Time)

This is my test code copied from Hugging Face

from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch, os, re


model_name = "ahmed-masry/unichart-chartqa-960"
image_path = "../images/6.png"
input_prompt = "<chartqa> What is average value of all the 'Female' bars? <s_answer>"

model = VisionEncoderDecoderModel.from_pretrained(model_name)
processor = DonutProcessor.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

image = Image.open(image_path).convert("RGB")
decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt").input_ids
pixel_values = processor(image, return_tensors="pt").pixel_values

outputs = model.generate(
    pixel_values.to(device),
    decoder_input_ids=decoder_input_ids.to(device),
    max_length=model.decoder.config.max_position_embeddings,
    early_stopping=True,
    pad_token_id=processor.tokenizer.pad_token_id,
    eos_token_id=processor.tokenizer.eos_token_id,
    use_cache=True,
    num_beams=4,
    bad_words_ids=[[processor.tokenizer.unk_token_id]],
    return_dict_in_generate=True,
)

sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = sequence.split("<s_answer>")[1].strip()
print(sequence)

This is chart I test

This is the answer Unichart returns
20000

Ahmed Masry · Answer 2 · Sun May 12 2024 02:45:15 GMT+0800 (China Standard Time)

Hello,
Thanks for your interest in our work. No, we didn't finetune our model on the val nor test sets. We made sure to filter them out before we trained the model.

We acknowledge that the numerical reasoning questions are still quite challenging to the model. As you can see in the paper, the performance on the human questions (which contain numerical reasoning questions) is still very limited (~ 43%), so there is still a huge room for improvement.