google-research / scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vid2seq | the way to evaluate the model on paragraph captioning

PKUCSS opened this issue · comments

Thanks for the great work! I have a question about the way to evaluate the model on paragraph captioning: do you fine-tune the pre-trained checkpoint on the paragraph captioning task, or just remove the event boundary predictions from the outputs of the dense captioning model for the evaluation on paragraph captioning?

@antoyang @a-nagrani Dear authors, thanks again for your great work. Could you please answer the above question, so that more followers could fairly evaluate your work on video paragraph captioning?

If I recall correctly, I removed the event boundary predictions from the outputs of the dense captioning model. But finetuning the pretrained model without time tokens should work fine too given Vid2Seq's performance on video clip captioning benchmarks.

@antoyang Thanks for the quick response!