A large scale robustness analysis for video and text, multimodal models on the YouCook2 and MSRVTT datasets.
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool