There are 49 repositories under image-captioning topic.
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Simple Swift class to provide all the configurations you need to create custom camera view in your app
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
TensorFlow Implementation of "Show, Attend and Tell"
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Meshed-Memory Transformer for Image Captioning. CVPR 2020
An open-source tool for sequence learning in NLP built on TensorFlow.
Complete Assignments for CS231n: Convolutional Neural Networks for Visual Recognition
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
Image Captioning using InceptionV3 and beam search
A reverse image search engine powered by elastic search and tensorflow
Transformer-based image captioning extension for pytorch/fairseq
Video to Text: Natural language description generator for some given video. [Video Captioning]
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Automatic image captioning model based on Caffe, using features from bottom-up attention.
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)