image-text

There are 1 repository under image-text topic.

salesforce / ALBEF
Code for ALBEF: a new vision-language pre-training method
vision-and-language representation-learning image-text weakly-supervised-learning contrastive-learning
Language:Python 1508
Sense-GVT / DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
big-model clip image-text multi-model self-supervised vision-language-pretraining zero-shot
Language:Python 628
imageinwords
google / imageinwords
Data release for the ImageInWords (IIW) paper.
dataset dataset-generation detailed-annotations detailed-descriptions evaluation human-annotation i2t image-captioning image-descriptions image-text image-to-text t2i
Language:JavaScript 194
X-PLUG / mPLUG
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
image-captioning image-text image-text-retrieval multimodal pretraining pytorch transformer visual-language vqa
Language:Python 81
labyrinth7x / Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching
Deep Cross-Modal Projection Learning for Image-Text Matching
image-text
Language:Python 72
glami / glami-1m
The largest multilingual image-text classification dataset. It contains fashion products.
computer-vision dataset deep-learning fashion image-text image-to-text multilingual multimodal natural-language-processing classification image-text-classification multilingual-image-text-classification image-classification text-classification text-to-image-generation multi-modal-deep-learning
Language:Jupyter Notebook 68
miccunifi / QualiCLIP
Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment
biqa blind-image-quality-assessment clip computer-vision deep-learning image-degradation image-processing image-quality image-quality-assessment image-text iqa low-level-vision no-reference-image-quality-assessment nr-iqa opinion-unaware opinion-unaware-nr-iqa ranking-loss self-supervised-learning vision-language
Language:Python 38
TheoCoombes / crawlingathome
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
dataset machine-learning dall-e image-text clip dataset-generation
Language:Python 31
zhangming8 / ocr_algo_server
ocr文字识别算法服务
text-recognize ocr python image-text
Language:C++ 22
antonlukin / poster-editor
Wrapper for PHP's GD Library for easy image manipulation. Support for scaling multi-line text, shapes, filters and smart resize.
composer image-processing image-text intervention php php-class php-gd php-image php-library poster-editor
Language:PHP 18
zabir-nabil / imagebert-keras
Keras implementation of ImageBERT from Microsoft
imagebert keras image-text
14
HuangRunHua / LiveTextWithImage
WWDC22: Enabling Live Text interactions with images in SwiftUI
image-processing image-text live-text swift swiftui swiftui-demo swiftui-example wwdc wwdc22
Language:Swift 13
TheoCoombes / crawlingathome-server
A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
machine-learning image-text dataset-generation dataset clip dall-e
Language:Python 13
awsaf49 / flickr-dataset
Download flickr8k, flickr30k image caption datasets
captioning-images clip image image-text siglip dataset flickr flickr30k flickr8k
5
dvlab-research / TagCLIP
clip image-text segmentation zero-shot
Language:Python 5
fatemeh-mohseni-AI / most-repeated-vocabulary-IELTS
This project is a FastAPI-based web application designed to analyze C a m b r i d g e I E L T S P D F s ( B o o k s 1 − 18 ) for the most and least repeated words. It can handle both regular text-based PDFs and scanned image-based PDFs by converting them to images and extracting text using OCR (Optical Character Recognition).
fast-api ielts image-text
Language:Python 5
leeyunjai / image2text
caption generator using lavis and argostranslate
caption caption-generation caption-generator captioning-images captions image-analysis image-text img2txt blip2
Language:Python 4
dinhanhx / VisualRoBERTa
The first public Vietnamese visual linguistic foundation model(s)
python python-3 python3 image-captioning image-text vietnamese-nlp visual-linguistic visual-question-answering
Language:Python 3
dngo-io / cover-creator
Write texts on images with php
php image-manipulation image-text image-processing textview
Language:PHP 3
waittim / ConVIRT-Colab
Contrastive Learning Representations for Images and Text Pairs. Colab implementation of ConVIRT for transfer learning with insufficient data volume.
contrastive-learning colab image-text
Language:Jupyter Notebook 3
reshalfahsi / image-captioning-mobilenet-llama3
Image Captioning With MobileNet-LLaMA 3
image-captioning llama3 mobilenetv3 pytorch pytorch-lightning image-text kv-cache rotary-position-embedding cnn grouped-query-attention rms-norm transformer flickr8k-dataset nlp
Language:Jupyter Notebook 2
CharlesYang030 / MTA
MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
image-text language-vision multilingual multimodal visualwsd
Language:Jupyter Notebook 1
CharlesYang030 / PolCLIP
PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations
image-text multimodal-wsd
Language:Jupyter Notebook 1
dinhanhx / VL-datasets
Some Python scripts to load Vietnamese visual linguistic data
image-captioning image-text python python-3 python3 vietnamese vietnamese-nlp visual-linguistic visual-question-answering
Language:Python 1
formulae-org / package-graphic-raster-js
Raster graphics package for Fōrmulæ, in JavaScript
formulae graphics javascript raster-graphics graphic-primitives graphics-programming image-colors image-coordinates image-text image-transformations rotating stroke-imaging turtle-graphics xor-mode
Language:JavaScript 1
jianzhnie / MultimodalTransformers
lmmtoolkit is a toolkit for Multi-Modal Learning
image-text multi-modal-learning text-image text-to-video
Language:Python 1
AkshayBura / Character-Recognition
Character Recognition system using CNN and Streamlit
cnn deep-neural-networks image-processing image-text preprocessing python recognizing-characters streamlit tensorflow
Language:Jupyter Notebook 0
DarkKnightSgh / Text-Image-Text
Text-Image-Text is a bidirectional system that enables seamless retrieval of images based on text descriptions, and vice versa. It leverages state-of-the-art language and vision models to bridge the gap between textual and visual representations.
flickr8k-dataset image-text information-retrieval python semantic-embedding streamlit text-image transformers huggingface-transformers
Language:Python 0
ppraneeth270 / img2text
image-text image2text textrecognition
Language:Python 0
Nexdata-AI / 10000-Image-caption-data-of-diverse-scenes
10000-Image-caption-data-of-diverse-scenes
caption-data image-recognition scene-recognition generative-ai image-text
Nexdata-AI / 10000-Image-caption-data-of-gestures
10000-Image-caption-data-of-gestures
asian caption-data gesture-recognition generative-ai image-text
Nexdata-AI / 10000-Image-caption-data-of-vehicles
10000-Image-caption-data-of-vehicles
caption-data image-recognition vehicle-detection generative-ai image-text
Nexdata-AI / 10100-Image-caption-data-of-human-face
10100-Image-caption-data-of-human-face
caption-data human-face-recognition image-recognition generative-ai image-text
Nexdata-AI / 11000-Image-Video-caption-data-of-human-action
11000-Image-Video-caption-data-of-human-action
caption-data computer-vision human-action-recognition aigc generative-ai image-text text-image
Nexdata-AI / 20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes
caption-data natural-scenes ocr generative-ai image-text text-image
xiongshufeng / MTFN-RR-PyTorch-Code
The offical code for paper "Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking", ACM Multimedia 2019 Oral
fusion image-text
Language:Python

image-text

salesforce / ALBEF

Sense-GVT / DeCLIP

google / imageinwords

X-PLUG / mPLUG

labyrinth7x / Deep-Cross-Modal-Projection-Learning-for-Image-Text-Matching

glami / glami-1m

miccunifi / QualiCLIP

TheoCoombes / crawlingathome

zhangming8 / ocr_algo_server

antonlukin / poster-editor

zabir-nabil / imagebert-keras

HuangRunHua / LiveTextWithImage

TheoCoombes / crawlingathome-server

awsaf49 / flickr-dataset

dvlab-research / TagCLIP

fatemeh-mohseni-AI / most-repeated-vocabulary-IELTS

leeyunjai / image2text

dinhanhx / VisualRoBERTa

dngo-io / cover-creator

waittim / ConVIRT-Colab

reshalfahsi / image-captioning-mobilenet-llama3

CharlesYang030 / MTA

CharlesYang030 / PolCLIP

dinhanhx / VL-datasets

formulae-org / package-graphic-raster-js

jianzhnie / MultimodalTransformers

AkshayBura / Character-Recognition

DarkKnightSgh / Text-Image-Text

ppraneeth270 / img2text

Nexdata-AI / 10000-Image-caption-data-of-diverse-scenes

Nexdata-AI / 10000-Image-caption-data-of-gestures

Nexdata-AI / 10000-Image-caption-data-of-vehicles

Nexdata-AI / 10100-Image-caption-data-of-human-face

Nexdata-AI / 11000-Image-Video-caption-data-of-human-action

Nexdata-AI / 20011--Image-Caption-Data-Of-OCR-In-Natural-Scenes

xiongshufeng / MTFN-RR-PyTorch-Code