There are 5 repositories under image-text-retrieval topic.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
Image captioning using python and BLIP
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.
A list of research papers on knowledge-enhanced multimodal learning
Searching Images: From Clip And Beyond
The Unified Code of Image-Text Retrieval for Further Exploration.
Modern Image Search's course repository for Super AI Engineer Development Program SS4
Matching questions to correct answers using pre-trained BERT models.