There are 0 repository under msvd topic.
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.
[ACM MM 2017 & IEEE TMM 2020] This is the Theano code for the paper "Video Description with Spatial Temporal Attention"
Source code for Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy
Source code for Delving Deeper into the Decoder for Video Captioning
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Python implementation of extraction of several visual features representations from videos
Source code of the paper titled *Attentive Visual Semantic Specialized Network for Video Captioning*
To build attention based encoder-decoder model for video captioning on the MSVD dataset
[Pattern Rcognition 2021] This is the Theano code for our paper "Enhancing the Alignment between Target Words and Corresponding Frames for Video Captioning".
This project utilizes advanced deep learning techniques to automatically generate contextually relevant captions for videos by extracting spatial and temporal features, while incorporating Gaussian attention to focus on important regions. This enhances video indexing, retrieval, and accessibility for visually impaired individuals.
LSTM RNN and Transformer networks video captioning on MSVD and MSR-VTT using attributes and SVOS
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).