There are 16 repositories under cross-modal-retrieval topic.
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
The Paper List of Cross-Modal Matching / Pretraining / Transfering for Preliminary Insight.
TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
[BMVC 2021] Text-Based Person Search with Limited Data
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Pytorch implement of the paper "VLDeformer: Vision Language Decomposed Transformer for Fast Cross-modal Retrieval", KBS 2022
Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"
Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval (ICMR'23 Oral)
The first research for semantic localization