There are 1 repository under video-language topic.
[CVPR2022] Official Implementation of ReferFormer
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
A Survey on video and language understanding.
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
An end-to-end masked contrastive video-and-language pre-training framework
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
A Video Chat Agent with Temporal Prior
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
The official GitHub page for the survey paper "Self-Supervised learning for Videos: A survey"
A repository of Video Language papers, code and datasets.