captioning

There are 0 repository under captioning topic.

facebookresearch / mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
captioning deep-learning dialog hateful-memes multi-tasking multimodal pretrained-models pytorch textvqa vqa
Language:Python 5598
roboflow / maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision qwen2-vl transformers vision-and-language vqa
Language:Python 2643
fpgaminer / joycaption
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
captioning joycaption vlm
Language:Jupyter Notebook 938
ltguo19 / VSUA-Captioning
Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
captioning deep-learning language-generation nlp pytorch
Language:Python 258
CapDec
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
captioning clip gpt-2 multimodal-deep-learning zero-shot-learning clipcap
Language:Python 201
Labbeti / aac-datasets
Audio Captioning datasets for PyTorch.
pytorch audio caption datasets captioning audio-captioning dataset deep-learning
Language:Python 122
HaydenFaulkner / Tennis
A Tennis dataset and models for event detection & commentary generation
tennis machine-learning computer-vision sportsanalytics dataset fine-grained eventdetection captioning video mxnet gluon
Language:Python 110
mitvis / vistext
VisText is a benchmark dataset for semantically rich chart captioning.
captioning captioning-images charts dataset t5
Language:Jupyter Notebook 95
drethage / fully-convolutional-point-network
Fully-Convolutional Point Networks for Large-Scale Point Clouds
3d captioning computer-vision deep-learning deep-neural-networks meshes point-cloud point-clouds semantic-segmentation
Language:Python 86
audio-captioning / clotho-dataset
Python code for handling the Clotho dataset.
audio-captioning audio audio-signal-processing deep-learning natural-language-processing captioning clotho-dataset
Language:Python 85
Mauville / MedCLIP
Medical image captioning using OpenAI's CLIP
captioning clip deep-learning machine-learning medical-imaging what-a-challenge-this-was
Language:Jupyter Notebook 85
wangleihitcs / MedicalReportGeneration
A Base Tensorflow Project for Medical Report Generation
tensorflow-models captioning medical-report-generate
Language:Python 70
ParitoshParmar / MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
action-quality-assessment mtl-aqa multitask-learning video-understanding video-processing video-captioning fine-grained-classification pytorch action-recognition fine-grained-action-recognition representation-learning c3d dilated-convolution dilated-c3d lstm captioning
Language:Python 69
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
captioning captioning-images captioning-videos computer-vision cvpr cvpr2023 vision-and-language
Language:Python 64
Labbeti / aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
audio audio-captioning captioning metrics text
Language:Python 63
TheShadow29 / VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
captioning captioning-videos event-relations grounding nlp semantic-roles srl video video-language vision vision-and-language
Language:Python 61
lucidrains / AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
attention attention-mechanism vqa visual-question-answering captioning
Language:Python 43
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
ai captioning chatgpt deep-learning image machine-learning python
Language:Python 42
audio-captioning / dcase-2020-baseline
Audio captioning baseline system for DCASE 2020 challenge.
audio-captioning audio-signal-processing captioning deep-learning deep-neural-networks machine-listening machine-learning signal-processing dcase2020 dcase
Language:Python 38
CurryYuan / X-Trans2Cap
[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
captioning cvpr2022
Language:Python 36
aimagelab / camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
image-captioning captioning-images captioning computer-vision artificial-intelligence pytorch
Language:Python 29
ebu / ebu-tt-live-toolkit
Toolkit for supporting the EBU-TT Live specification
ebu-tt python subtitles captions subtitling captioning live broadcast video
Language:Python 26
RyanLiut / awesome-diverse-captioning
Some papers about *diverse* image (a few videos) captioning
diversity captioning
26
alecwangcq / show-attend-and-tell
captioning
Language:Jupyter Notebook 25
elbayadm / PaperNotes
My notes on some Deep Learning papers
captioning deep-learning paper-notes papers seq2seq
Language:HTML 24
FeiElysia / awesome-zero-shot-captioning
A curated list of zero-shot captioning papers
captioning image-to-text video-to-text zero-shot
24
AdrianHsu / S2VT-seq2seq-video-captioning-attention
S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow
attention-mechanism captioning deep-learning seq2seq tensorflow video
Language:Python 19
aimagelab / PMA-Net
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
captioning captioning-images iccv2023 image-captioning memory-augmented-neural-networks transformer vision-and-language vision-language
Language:Python 19
audio-captioning / caption-evaluation-tools
Tools for the evaluation of audio captioning.
audio-captioning machine-translation-metrics captioning
Language:Jupyter Notebook 18
hassanhub / R3Transformer
Official python implementation of R3-Transformer
captioning r3-transformer transformer
Language:Python 15
nssharmaofficial / reddit-hole
Automated reddit scraper and video creator
amazon-polly amazon-polly-api automation aws captioning openai openai-whisper reddit reddit-bot reddit-crawler reddit-scraper tts whisper
Language:Python 14
2dameneko / ide-cap-chan
ide-cap-chan is a utility for batch image captioning with natural language using various VL models
ai batch captioning llm ml python vlm gpu joycaption llava mistral molmo multigpu nvidia pixtral qwen idefics3
Language:Python 13
ImKeTT / ZeroGen
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
captioning controllable-text-generation decoding gpt2 multimodal nlpcc vision-language zero-shot
Language:Python 13
rayandrew / indonesian-image-captioning
Indonesian Image Captioning using Attention-based Semantic Compositional Networks
attention captioning image-captioning indonesia indonesian pytorch resnet
Language:Jupyter Notebook 13
ZhaoPeiduo / BLIP2-Japanese
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
captioning japanese pytorch blip2 multimodal-deep-learning
Language:Python 13
mrazhou / SEN
Single-stream Extractor Network with Contrastive Pre-training for Remote Sensing Change Captioning
change-captioning deep-learning pytorch captioning image-captioning remote-sensing
Language:Python 10

captioning

facebookresearch / mmf

roboflow / maestro

fpgaminer / joycaption

ltguo19 / VSUA-Captioning

DavidHuji / CapDec

Labbeti / aac-datasets

HaydenFaulkner / Tennis

mitvis / vistext

drethage / fully-convolutional-point-network

audio-captioning / clotho-dataset

Mauville / MedCLIP

wangleihitcs / MedicalReportGeneration

ParitoshParmar / MTL-AQA

aimagelab / pacscore

Labbeti / aac-metrics

TheShadow29 / VidSitu

lucidrains / AoA-pytorch

DavidMChan / caption-by-committee

audio-captioning / dcase-2020-baseline

CurryYuan / X-Trans2Cap

aimagelab / camel

ebu / ebu-tt-live-toolkit

RyanLiut / awesome-diverse-captioning

alecwangcq / show-attend-and-tell

elbayadm / PaperNotes

FeiElysia / awesome-zero-shot-captioning

AdrianHsu / S2VT-seq2seq-video-captioning-attention

aimagelab / PMA-Net

audio-captioning / caption-evaluation-tools

hassanhub / R3Transformer

nssharmaofficial / reddit-hole

2dameneko / ide-cap-chan

ImKeTT / ZeroGen

rayandrew / indonesian-image-captioning

ZhaoPeiduo / BLIP2-Japanese

mrazhou / SEN