what is the meaning of frame_num and answer_num?

Question

what is the meaning of frame_num and answer_num?

aixiaodewugege opened this issue a year ago · comments

shuchen.wu commented a year ago

Thanks for your brilliant work!

I can't find explanations about these two configuration : frame_num and answer_num . Could you please help me?

Shoubin · Answer 1 · Fri May 26 2023 12:12:46 GMT+0800 (China Standard Time)

Thanks for your interest in our work! Here are explanations for those parameters

model.frame_num: num of selected keyframes
datasets.nextqa.vis_processor.train.n_frms: num of frames for selection
model. answer_num: num of multi-choice options (e.g. NeXT-QA has 5 options for each QA, STAR has 4 options for each QA)

shuchen.wu · Answer 2 · Fri May 26 2023 12:22:19 GMT+0800 (China Standard Time)

Thanks for your relay!

I have tested a lot on your web demo. But I found the zero shot result is not very good on my dataset.

I find the model will always output option1. Any idea about what is the problem? I only have one GPU, is there any way that I can test it not on the web demo?

Shoubin · Answer 3 · Fri May 26 2023 13:00:16 GMT+0800 (China Standard Time)

We have instructions for running the Gradio demo locally and running the evaluation in this repo.
SeViLA requires at least 12 GB of memory to load the model and run an inference with batch size 1.

shuchen.wu · Answer 4 · Fri May 26 2023 13:05:13 GMT+0800 (China Standard Time)

Sorry for my wrong expression. I have made it run locally with Gradio. I mean does it support model.predict_answers() function like BLIP2 to do inference? So that I can test on a dataset and see the output.

Besides, could you please give me some help about how to use your sevila without setting options? Should I change the sevila.generate_demo to sevila.generate or sevila.predict_answers ?

Shoubin · Answer 5 · Fri May 26 2023 13:16:00 GMT+0800 (China Standard Time)

Yes, you can check and use generate() function to test on multi-choice QA datasets.
For open-ended answer generation, you can input with only questions and decode the FlanT5 output check here.

Franklee95 · Answer 6 · Mon Sep 11 2023 01:33:46 GMT+0800 (China Standard Time)

The same question.when I feed into models in nextqa datasets.I always get option1 in response.