video-text-retrieval

There are 4 repositories under video-text-retrieval topic.

ArrowLuo / CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
video-text-retrieval multimodal-learning multimodality multimodal search ranking retrieval-model retrieval msrvtt lsmdc msvd activitynet didemo video-clip-retrieval clip
Language:Python 982
Paranioar / Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
cross-modal-retrieval tutorial awesome-list image-text-matching image-text-retrieval large-language-models large-vision-language-models large-vision-models memory-efficient-tuning multimodal-pretraining parameter-efficient-fine-tuning video-text-recognition video-text-retrieval vision-and-language visual-semantic-embedding multimodal-large-language-models large-language-model text-to-image-generation text-to-image-synthesis text-to-video-generation
428
microsoft / UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
multimodality video-text pretraining youcookii retrieval-task msrvtt caption pretrain video caption-task video-language multimodal-sentiment-analysis localization segmentation coin joint alignment video-text-retrieval
Language:Python 361
xid32 / NAACL_2025_TWM
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
multimodal-foundation-model multimodal-large-language-models audio-visual-learning question-answering video-captioning video-text-retrieval working-memory
Language:Python 307
whwu95 / Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
cross-modal-learning video-language-understanding video-text-retrieval video-understanding
Language:Python 243
m-bain / CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
dataset precomputed-features retrieval source-videos video-text-retrieval
Language:Python 186
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
vision-and-language video-language video-text-retrieval video-question-answering representation-learning prompt-learning
Language:Python 186
xuguohai / X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
multimodal video-text-retrieval msrvtt activitynet didemo lsmdc msvd
Language:Python 172
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
image-text-retrieval multimodal-learning multimodal-llm video-editing video-text-retrieval
Language:Python 165
amazon-science / crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
multi-modality video video-text-retrieval video-captioning computer-vision natural-language-processing transformers contrastive-learning
Language:Python 62
LeapLabTHU / Cross-Modal-Adapter
[Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrieval
adapter clip parameter-efficient-tuning video-text-retrieval vision-and-language pytorch deep-learning machine-learning parameter-efficient-learning
Language:Python 56
RenShuhuai-Andy / TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
long-video-understanding video-qa video-text-retrieval video-understanding
Language:Python 50
knightyxp / DGL
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
cross-modal-learning cross-modal-retrieval parameter-efficient-tuning prompt-tuning video-language-understanding video-text-retrieval
Language:Python 40
unitaryai / VTC
VTC: Improving Video-Text Retrieval with User Comments
multimodal-deep-learning video-text-retrieval video-understanding vision-language-pretraining vision-language-transformer comments
Language:Python 13
rn-snehapriya / Automatic-Note-Taking-From-Video-Using-Tesseract-OCR
Text from the video is extracted and saved into a .docx file in the form of notes.
automatic-note-taking tesseract-ocr machine-learning video-to-text video-text-recognition video-text-retrieval video-text ocr
Language:Jupyter Notebook 10
Jazz1996 / tech_review
Survey of state-of-art video-text retrieval methods.
video-text-retrieval
0
unitaryai / VTC-dataset
dataset video-text-retrieval video-understanding vision-language-pretraining vision-language-dataset
Language:Python 0