video-language

There are 1 repository under video-language topic.

showlab / VLog
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
chatgpt langchain video-language large-language-model whisper
Language:Python 486
microsoft / UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
multimodality video-text pretraining youcookii retrieval-task msrvtt caption pretrain video caption-task video-language multimodal-sentiment-analysis localization segmentation coin joint alignment video-text-retrieval
Language:Python 327
wjn922 / ReferFormer
[CVPR2022] Official Implementation of ReferFormer
referring-video-object-segmentation video-language
Language:Python 308
showlab / UniVTG
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
highlight-detection moment-retrieval pretraining video-grounding video-language video-summarization
Language:Python 280
showlab / all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
codebase pre-training video-language pytorch
Language:Python 272
junchen14 / Multi-Modal-Transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
efficiency-transformer image-transformer language mlp-mixer multi-modal multi-modal-cvpr2021 transformer-readling-list video-language video-transformer vision-transformer
207
showlab / EgoVLP
[NeurIPS2022] Egocentric Video-Language Pretraining
egocentric-vision pretraining pytorch video-language
Language:Python 204
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
vision-and-language video-language video-text-retrieval video-question-answering representation-learning prompt-learning
Language:Python 182
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
blip clip gpt-3 msrvtt msvd vatex video-language vision-language youcook2 vlep
Language:Python 110
TheShadow29 / VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
vision vision-and-language grounding nlp video video-language event-relations semantic-roles srl captioning-videos captioning
Language:Python 54
bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
benchmark dataset large-language-models video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language
Language:Python 45
showlab / Region_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
video-language
Language:Python 42
liveseongho / Awesome-Video-Language-Understanding
A Survey on video and language understanding.
awesome-papers dataset deep-learning machine-learning multimodal-deep-learning paper video-language video-language-pretraining video-language-understanding
40
zinengtang / Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
efficiency retrieval scalability video-language vision-and-language
Language:Python 32
willyfh / awesome-video-text-datasets
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
dataset video-captioning video-description video-text video-to-text vision-language video-language video-retrieval
24
zjr2000 / GVL
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
dense-video-captioning pytorch pytorch-implementation representation-learning temporal-localization video-grounding video-language long-video-understanding
Language:Python 24
shufangxun / MAC
An end-to-end masked contrastive video-and-language pre-training framework
multimodal pretraining vision-transformer activitynet clip didemo msrvtt video-language contrastive-learning mae end-to-end-learning video-text-retrieval vision-and-language pytorch
23
zinengtang / DeCEMBERT
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
video video-language video-language-understanding vision-language
Language:Python 17
bigai-nlco / LSTP-Chat
A Video Chat Agent with Temporal Prior
llm mllm multimodal-large-language-models spatial-temporal video-language visual-instruction-tuning
Language:Python 15
JerryYLi / svitt
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
video-language vision-language
Language:Python 13
waybarrios / guidance-based-video-grounding
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
moment-retrieval multimodal-learning pytorch video-language accepted-papers iccv2023
11
SCZwangxiao / RTQ-MM2023
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
deep-learning foundational-models machine-learning multi-modal video-language video-language-pretraining video-understanding vision-and-language
Language:Python 10
MCG-NJU / VLG
VLG: General Video Recognition with Web Textual Knowledge (https://arxiv.org/abs/2212.01638)
action-recognition few-shot-recognition long-tailed-recognition open-set-recognition video-language
Language:Python 8
Maddy12 / SSL4VideoSurvey
The official GitHub page for the survey paper "Self-Supervised learning for Videos: A survey"
action-recognition computer-vision pre-training text-to-video video-language video-language-pretraining video-language-understanding video-to-video
2
jena-shreyas / Awesome-Video-Language-Resources
A repository of Video Language papers, code and datasets.
multimodal multimodal-deep-learning video-language video-language-understanding
0