There are 1 repository under image-text topic.
Code for ALBEF: a new vision-language pre-training method
Data release for the ImageInWords (IIW) paper.
Deep Cross-Modal Projection Learning for Image-Text Matching
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
Wrapper for PHP's GD Library for easy image manipulation. Support for scaling multi-line text, shapes, filters and smart resize.
Keras implementation of ImageBERT from Microsoft
WWDC22: Enabling Live Text interactions with images in SwiftUI
A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
Download flickr8k, flickr30k image caption datasets
This project is a FastAPI-based web application designed to analyze C a m b r i d g e I E L T S P D F s ( B o o k s 1 − 18 ) for the most and least repeated words. It can handle both regular text-based PDFs and scanned image-based PDFs by converting them to images and extracting text using OCR (Optical Character Recognition).
caption generator using lavis and argostranslate
The first public Vietnamese visual linguistic foundation model(s)
Write texts on images with php
Contrastive Learning Representations for Images and Text Pairs. Colab implementation of ConVIRT for transfer learning with insufficient data volume.
Image Captioning With MobileNet-LLaMA 3
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations
Some Python scripts to load Vietnamese visual linguistic data
Raster graphics package for Fōrmulæ, in JavaScript
lmmtoolkit is a toolkit for Multi-Modal Learning
Character Recognition system using CNN and Streamlit
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
10000-Image-caption-data-of-diverse-scenes
10000-Image-caption-data-of-gestures
10000-Image-caption-data-of-vehicles
10100-Image-caption-data-of-human-face
11000-Image-Video-caption-data-of-human-action
20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral