image-captioning

There are 49 repositories under image-captioning topic.

salesforce / LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets multimodal-deep-learning
Language:Jupyter Notebook 8863
salesforce / BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
image-captioning image-text-retrieval vision-and-language-pre-training vision-language vision-language-transformer visual-question-answering visual-reasoning
Language:Jupyter Notebook 4314
OpenGVLab / InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
chatgpt foundation-model gpt gpt-4 gradio husky image-captioning langchain llm multimodal vqa internimage llama vicuna video-generation sam segment-anything click imagebind draggan
Language:Python 3136
sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
pytorch pytorch-tutorial show-attend-and-tell image-captioning encoder-decoder attention-mechanism computer-vision mscoco
Language:Python 2670
OFA-Sys / OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning chinese
Language:Python 2337
ttengwang / Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
chatgpt controllable-generation controllable-image-captioning image-captioning segment-anything
Language:Python 1605
peteanderson80 / bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
vqa visual-question-answering captioning-images faster-rcnn caffe image-captioning mscoco mscoco-dataset
Language:Jupyter Notebook 1407
imaginary-cloud / CameraManager
Simple Swift class to provide all the configurations you need to create custom camera view in your app
camera carthage cocoapods custom-camera image-captioning ios qrcode-reader swift swift-package-manager video-recording
Language:Swift 1351
NVlabs / prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa
Language:Python 1286
Oscar
microsoft / Oscar
Oscar and VinVL
image-captioning image-text-search oscar pre-training vinvl vision-and-language vqa
Language:Python 1032
YehLi / xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
image-captioning video-captioning vision-and-language pretraining cross-modal-retrieval visual-question-answering tden
Language:Python 1010
ruotianluo / self-critical.pytorch
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
image-captioning
Language:Python 986
yunjey / show-attend-and-tell
TensorFlow Implementation of "Show, Attend and Tell"
tensorflow image-captioning show-attend-and-tell attention-mechanism mscoco-image-dataset
Language:Jupyter Notebook 905
kdexd / virtex
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
image-captioning coco-dataset pretrained-models model-zoo cvpr2021
Language:Python 555
kuanghuei / SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
computer-vision cross-modal deep-learning image-captioning neural-network pytorch visual-semantic
Language:Python 526
SkalskiP / awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
blip clip computer-vision foundational-models grounding-dino image-captioning llava multimodal nlp open-vocabulary-detection open-vocabulary-segmentation segment-anything zero-shot-detection
Language:Python 514
subho406 / OmniNet
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
machine-learning deep-learning neural-network artificial-intelligence transformer nlp image-captioning video-recognition multitask-learning multimodal-learning
Language:Python 509
aimagelab / meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
image-captioning transformer captioning-images caption-generation visual-semantic pytorch cvpr2020
Language:Python 504
ufal / neuralmonkey
An open-source tool for sequence learning in NLP built on TensorFlow.
neural-machine-translation tensorflow nlp sequence-to-sequence python neural-networks nmt machine-translation mt deep-learning image-captioning encoder-decoder gpu
Language:Python 410
MahanFathi / CS231
Complete Assignments for CS231n: Convolutional Neural Networks for Visual Recognition
convolutional-neural-networks tensorflow cs231n visual-recognition image-captioning deep-learning computer-vision stanford neural-networks solutions assignments dd
Language:Jupyter Notebook 370
jiasenlu / AdaptiveAttention
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
image-captioning torch attention-mechanism
Language:Jupyter Notebook 333
yashk2810 / Image-Captioning
Image Captioning using InceptionV3 and beam search
keras image-captioning tensorflow cnn beam-search lstm
Language:Jupyter Notebook 327
husthuaan / AoANet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
image-captioning attention-mechanism iccv2019
Language:Python 324
jhc13 / taggui
Tag manager and captioner for image datasets
image-captioning image-tagging pyside6 stable-diffusion tag-manager llava cogvlm cogagent moondream
Language:Python 323
Image-to-Image-Search
sethuiyer / Image-to-Image-Search
A reverse image search engine powered by elastic search and tensorflow
deep-learning elasticsearch image-captioning search-engine
Language:Python 313
krasserm / fairseq-image-captioning
Transformer-based image captioning extension for pytorch/fairseq
image-captioning transformer pytorch fairseq
Language:Python 312
scopeInfinity / Video2Description
Video to Text: Natural language description generator for some given video. [Video Captioning]
deep-neural-networks cnn-keras lstm-neural-networks image-captioning video-captioning video-processing audio-processing video-to-text
Language:Python 301
aimagelab / show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
image-captioning captioning-images caption-generation visual-semantic pytorch cvpr2019
Language:Python 282
dabasajay / Image-Caption-Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
image-captioning recurrent-neural-networks convolutional-neural-networks lstm deep-learning inceptionv3 inception-v3 cnn-keras image-caption bleu bleu-score flickr-dataset flickr-8k attention attention-mechanism beam-search attention-model vgg16 captioning-images caption-generation
Language:Python 270
JDAI-CV / image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
image-captioning vision-and-language
Language:Python 269
anuragmishracse / caption_generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
bleu-score captioning-images cnn image image-captioning keras lstm rnn tensorflow
Language:Python 262
DataTurks / DataTurks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
image-processing image-classification annotation-tool ner java image-captioning image-segmentation document-classification document-annotate
Language:JavaScript 258
yxuansu / MAGIC
Language Models Can See: Plugging Visual Controls in Text Generation
multimodal plug-and-play-language-models text-generation unsupervised-learning zero-shot clip gpt-2 image-captioning story-generation
Language:Python 251
saahiluppal / catr
Image Captioning Using Transformer
image-captioning transformer
Language:Python 246
peteanderson80 / Up-Down-Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
captioning-images image-captioning caffe lstm
Language:Jupyter Notebook 240
j-min / CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
clip image-captioning reinforcement-learning vision-and-language
Language:Python 226

image-captioning

salesforce / LAVIS

salesforce / BLIP

OpenGVLab / InternGPT

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

OFA-Sys / OFA

ttengwang / Caption-Anything

peteanderson80 / bottom-up-attention

imaginary-cloud / CameraManager

NVlabs / prismer

microsoft / Oscar

YehLi / xmodaler

ruotianluo / self-critical.pytorch

yunjey / show-attend-and-tell

kdexd / virtex

kuanghuei / SCAN

SkalskiP / awesome-foundation-and-multimodal-models

subho406 / OmniNet

aimagelab / meshed-memory-transformer

ufal / neuralmonkey

MahanFathi / CS231

jiasenlu / AdaptiveAttention

yashk2810 / Image-Captioning

husthuaan / AoANet

jhc13 / taggui

sethuiyer / Image-to-Image-Search

krasserm / fairseq-image-captioning

scopeInfinity / Video2Description

aimagelab / show-control-and-tell

dabasajay / Image-Caption-Generator

JDAI-CV / image-captioning

anuragmishracse / caption_generator

DataTurks / DataTurks

yxuansu / MAGIC

saahiluppal / catr

peteanderson80 / Up-Down-Captioner

j-min / CLIP-Caption-Reward