video-captioning

There are 33 repositories under video-captioning topic.

YehLi / xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
cross-modal-retrieval image-captioning pretraining tden video-captioning vision-and-language visual-question-answering
Language:Python 970
xiadingZ / video-caption.pytorch
pytorch implementation of video captioning
deep-learning pytorch video-captioning
Language:Python 400
scopeInfinity / Video2Description
Video to Text: Natural language description generator for some given video. [Video Captioning]
deep-neural-networks cnn-keras lstm-neural-networks image-captioning video-captioning video-processing audio-processing video-to-text
Language:Python 343
xid32 / NAACL_2025_TWM
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
multimodal-foundation-model multimodal-large-language-models audio-visual-learning question-answering video-captioning video-text-retrieval working-memory
Language:Python 308
tomchang25 / whisper-auto-transcribe
Auto transcribe tool based on whisper
asr deep-learning gradio gradio-interface language-model pytorch speech-processing speech-recognition speech-to-text text-to-speech video-captioning voice-activity-detection
Language:Python 225
antoyang / VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
dense-video-captioning multimodal-learning pre-training temporal-language-grounding vid2seq video-captioning video-chapter-generation video-understanding vision-and-language weakly-supervised-learning
Language:Jupyter Notebook 188
jayleicn / recurrent-transformer
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
pytorch video-captioning activitynet-captions youcook2
Language:Jupyter Notebook 169
vijayvee / video-captioning
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
video-captioning tensorflow s2vt sequence-to-sequence multimodal-deep-learning seq2seq
Language:Python 165
JasonYao81000 / MLDS2018SPRING
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
mlds2018spring ntu hung-yi-lee mlds seq2seq sequence-to-sequence gan generative-adversarial-network reinforcement-learning policy-gradient deep-q-network actor-critic chat-bot chatbot chinese-chatbot video-captioning image-generation text-to-image 2018 spring
Language:Python 145
jpthu17 / EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval
Language:Python 130
bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
benchmark dataset large-language-models video-language video-language-pretraining video-question-answering video-summarization vision-language video-captioning video-story video-story-generation research
Language:Python 124
jssprz / video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
video-captioning video-description vision-and-language video-dataset video-to-text msvd msr-vtt activitynet-captions trecvid charades vatex tgif-dataset review state-of-the-art
Language:Jupyter Notebook 121
terry-r123 / Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
image-captioning video-captioning text-captioning
110
jayleicn / TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
dataset pytorch video-captioning
Language:Python 90
Kamino666 / Video-Captioning-Transformer
这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。视频描述生成任务指的是：输入一个视频，输出一句描述整个视频内容的文字（前提是视频较短且可以用一句话来描述）。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境，促进“无障碍视频”的发展。
pytorch transformer video-captioning
Language:Python 85
nasib-ullah / video-captioning-models-in-Pytorch
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
video-captioning deep-learning sequence-to-sequence msvd msrvtt s2vt pytorch pytorch-implementation video-captioning-models video marn recnet
Language:Python 70
ParitoshParmar / MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
action-quality-assessment mtl-aqa multitask-learning video-understanding video-processing video-captioning fine-grained-classification pytorch action-recognition fine-grained-action-recognition representation-learning c3d dilated-convolution dilated-c3d lstm captioning
Language:Python 66
UARK-AICV / VLTinT
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
aaai2023 pytorch transformer-architecture video-captioning video-paragraph-captioning vision-language
Language:Jupyter Notebook 66
amazon-science / crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
computer-vision contrastive-learning multi-modality natural-language-processing transformers video video-captioning video-text-retrieval
Language:Python 62
jacobswan1 / Video2Commonsense
Video captioning baseline models on Video2Commonsense Dataset.
commonsense-question-answering commonsense-story video-captioning video2commonsense
Language:Python 56
lvapeab / ABiViRNet
Attention Bidirectional Video Recurrent Net
attention-mechanism deep-learning keras lstm python tensorflow theano video-captioning
Language:Python 56
imshaikot / srt-webvtt
Convert SRT formatted subtitle to WebVTT on the fly over HTML5/browser environment
video video-captioning web-vtt srt-subtitles converter html5 html5-video
Language:TypeScript 51
pochih / Video-Cap
🎬 Video Captioning: ICCV '15 paper implementation
seq2seq video-captioning attention-mechanism tnesorflow deep-learning nlp computer-vision
Language:Python 47
TXH-mercury / COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
video-captioning video-language-pretrainng video-qa video-retrieval vision-language-pretraining
Language:Python 42
LuoweiZhou / densecap
Dense video captioning in PyTorch
dense-video-captioning youcook2 activitynet-captions video-captioning transformer
Language:Jupyter Notebook 41
tsujuifu / pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
cvpr2023 pytorch pre-training video-captioning video-question-answering video-retrieval vision-and-language
Language:Python 40
acherstyx / CoCap
[ICCV 2023] Accurate and Fast Compressed Video Captioning
compressed-video iccv2023 video-captioning
Language:Python 39
WingsBrokenAngel / delving-deeper-into-the-decoder-for-video-captioning
Source code for Delving Deeper into the Decoder for Video Captioning
tensorflow video-captioning msvd msr-vtt state-of-the-art semantics professional-learning decoder
Language:Jupyter Notebook 39
willyfh / awesome-video-text-datasets
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
dataset video-captioning video-description video-text video-to-text vision-language video-language video-retrieval
36
xiadingZ / video-caption-openNMT.pytorch
implement video caption based on openNMT
pytorch video-captioning
Language:Python 36
mlvlab / MELTR
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
cvpr2023 meta-learning multi-modal video-captioning video-question-answering video-retrieval
Language:Python 33
jssprz / visual_syntactic_embedding_video_captioning
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
video-captioning msvd msr-vtt wacv2021 deep-learning pos-tagging representation-learning encoder-decoder syntactic-representations video-description video-to-text
Language:Python 30
zjr2000 / LLMVA-GEBC
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
long-video-understanding pytorch-implementation video-captioning
Language:Python 29
UARK-AICV / VLCAP
[ICIP 2022 oral] VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
contrastive-learning transformer video-captioning vision-and-language
Language:Jupyter Notebook 28
yangbang18 / CARE
(TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information
concept-detection pytorch video-captioning
Language:Jupyter Notebook 27
rohit-gupta / Video2Language
Generating video descriptions using deep learning in Keras
deep-learning computer-vision natural-language-processing keras keras-models deep-video-analytics video-captioning video-to-text
Language:Python 25

video-captioning

YehLi / xmodaler

xiadingZ / video-caption.pytorch

scopeInfinity / Video2Description

xid32 / NAACL_2025_TWM

tomchang25 / whisper-auto-transcribe

antoyang / VidChapters

jayleicn / recurrent-transformer

vijayvee / video-captioning

JasonYao81000 / MLDS2018SPRING

jpthu17 / EMCL

bytedance / Shot2Story

jssprz / video_captioning_datasets

terry-r123 / Awesome-Captioning

jayleicn / TVCaption

Kamino666 / Video-Captioning-Transformer

nasib-ullah / video-captioning-models-in-Pytorch

ParitoshParmar / MTL-AQA

UARK-AICV / VLTinT

amazon-science / crossmodal-contrastive-learning

jacobswan1 / Video2Commonsense

lvapeab / ABiViRNet

imshaikot / srt-webvtt

pochih / Video-Cap

TXH-mercury / COSA

LuoweiZhou / densecap

tsujuifu / pytorch_empirical-mvm

acherstyx / CoCap

WingsBrokenAngel / delving-deeper-into-the-decoder-for-video-captioning

willyfh / awesome-video-text-datasets

xiadingZ / video-caption-openNMT.pytorch

mlvlab / MELTR

jssprz / visual_syntactic_embedding_video_captioning

zjr2000 / LLMVA-GEBC

UARK-AICV / VLCAP

yangbang18 / CARE

rohit-gupta / Video2Language