Does the pertrainging Corpus contain the test and val images in ChartQA?
Evanwu1125 opened this issue · comments
I've benchmarked Unichart on some personalized questions with ChartQA pictures recently, but I find that it did not perform as well as you claimed in the paper.
This is my test code copied from Hugging Face
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch, os, re
model_name = "ahmed-masry/unichart-chartqa-960"
image_path = "../images/6.png"
input_prompt = "<chartqa> What is average value of all the 'Female' bars? <s_answer>"
model = VisionEncoderDecoderModel.from_pretrained(model_name)
processor = DonutProcessor.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
image = Image.open(image_path).convert("RGB")
decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt").input_ids
pixel_values = processor(image, return_tensors="pt").pixel_values
outputs = model.generate(
pixel_values.to(device),
decoder_input_ids=decoder_input_ids.to(device),
max_length=model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = sequence.split("<s_answer>")[1].strip()
print(sequence)
This is chart I test
This is the answer Unichart returns
20000
Hello,
Thanks for your interest in our work. No, we didn't finetune our model on the val nor test sets. We made sure to filter them out before we trained the model.
We acknowledge that the numerical reasoning questions are still quite challenging to the model. As you can see in the paper, the performance on the human questions (which contain numerical reasoning questions) is still very limited (~ 43%), so there is still a huge room for improvement.