There are 32 repositories under video-captioning topic.
pytorch implementation of video captioning
Video to Text: Natural language description generator for some given video. [Video Captioning]
Auto transcribe tool based on whisper
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句话来描述)。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境,促进“无障碍视频”的发展。
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Video captioning baseline models on Video2Commonsense Dataset.
Convert SRT formatted subtitle to WebVTT on the fly over HTML5/browser environment
Dense video captioning in PyTorch
A PyTorch implementation of EmpiricalMVM
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Source code for Delving Deeper into the Decoder for Video Captioning
implement video caption based on openNMT
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
Generating video descriptions using deep learning in Keras
(TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information
Deep learning works for ADLxMLDS (CSIE 5431) in NTU