[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
Home Page:https://arxiv.org/abs/2305.06988
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
zhl98 opened this issue a year ago · comments
Hello. I see that the script uses the following command to load the model, but there are no parameters for vit and t5 in it. How can I load them?
load_finetuned: True finetuned: 'https://huggingface.co/Shoubin/SeViLA/resolve/main/sevila_pretrained.pth'
Hi, thanks for your interest SeViLA project. please check this issue for details.