Yui010206 / SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Home Page:https://arxiv.org/abs/2305.06988

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem on NextQA Inference

Franklee95 opened this issue · comments

Hi,
When I using your SeViLA localizer on QVHighlights and Inference on NextQA data.I found all predictions are option1.And output key frames of the localizer are indential( like [0,1,2,3]).So what's the potential reason of this problem?
image

I have found the reason! I print the output logits of the t5 model(sevila.py #587)
image
I found that all tensors are NaN
image

Thanks god! Finally I solve the problem! The real reason is that my gpu device(V100) cannot support torch.bfloat16.So I change from bfloat16 to float16 , which will lead to out of mixture.Should change from bfloat16 to float32 will get the correct answer