video-question-answering

There are 7 repositories under video-question-answering topic.

OpenGVLab / Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
captioning-videos chatgpt gradio langchain video-question-answering video-understanding stablelm chat video big-model foundation-models large-language-models large-model
Language:Python 2855
OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
foundation-models video-understanding vision-transformer action-recognition masked-autoencoder multimodal open-set-recognition spatio-temporal-action-localization temporal-action-localization video-question-answering video-retrieval zero-shot-classification zero-shot-retrieval benchmark contrastive-learning self-supervised instruction-tuning video-data video-dataset video-clip
Language:Python 1099
jayleicn / ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
cvpr2021 pytorch video-question-answering video-retrieval vision-and-language vqa
Language:Python 692
Vision-CAIR / MiniGPT4-video
Official code for MiniGPT4-video
video-question-answering video-understanding
Language:Python 440
X-PLUG / Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
benchmark chinese dataset mllm multimodal multimodal-large-language-models multimodal-pretraining video video-question-answering video-retrieval youku
Language:Python 267
X-PLUG / mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
foundation-models mllm multimodal multimodal-pretraining video image-retrieval mplug video-question-answering video-retrieval vqa
Language:Python 212
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
prompt-learning representation-learning video-language video-question-answering video-text-retrieval vision-and-language
Language:Python 185
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
large-language-models multimodal-learning pre-training video-question-answering video-understanding videoqa vision-and-language visual-question-answering vqa weakly-supervised-learning
Language:Python 147
tsujuifu / pytorch_violet
A PyTorch implementation of VIOLET
pytorch vision-and-language pre-training video-retrieval video-question-answering
Language:Python 135
jayleicn / TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
video-question-answering dataset tvqa pytorch
Language:Python 121
just-ask
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
vqa visual-question-answering videoqa video-question-answering video-understanding question-generation weakly-supervised-learning vision-and-language pre-training multimodal-learning
Language:Jupyter Notebook 114
doc-doc / NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
vision-language multi-object-interaction causal-temporal-action-reasoning video-question-answering video-understanding videoqa
Language:Python 109
jpthu17 / EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
cross-modal-retrieval neurips video-captioning video-question-answering video-retrieval
Language:Python 108
jpthu17 / HBI
[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
cross-modal-retrieval cvpr video-question-answering video-retrieval
Language:Python 97
bytedance / Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
benchmark dataset large-language-models video-language video-language-pretraining video-question-answering video-summarization vision-language video-captioning video-story video-story-generation
Language:Python 69
mlvlab / Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
emnlp2023 large-language-models multi-modal video-question-answering visual-question-answering
Language:Python 62
bcmi / Causal-VidQA
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.
commonsense-reasoning video-question-answering evidence-reason video-question-answering-dataset visual-understanding
Language:Python 49
sail-sg / VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
graph-transformer temporal-dynamics video-language-understanding video-question-answering videoqa
Language:Python 45
zchoi / PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
pytorch pytorch-implementation video-question-answering video-retrieval vision-language
Language:Python 45
tsujuifu / pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
cvpr2023 pytorch pre-training video-captioning video-question-answering video-retrieval vision-and-language
Language:Python 37
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
video-question-answering videoqa trustworthy-vqa visual-evidence-grounding video-language-understanding video-grounding
Language:Python 36
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
chatbot chatgpt llava multimodal-large-language-models training-free video-question-answering video-understanding vision-language-model zero-shot-video-captioning
Language:Python 35
mlvlab / MELTR
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
cvpr2023 meta-learning multi-modal video-captioning video-question-answering video-retrieval
Language:Python 32
noagarcia / knowit-rock
ROCK model for Knowledge-Based VQA in Videos
vqa visual-question-answering video-question-answering knowledge-base
Language:Python 30
doc-doc / HQGA
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
videoqa conditional-graph-hierarchy vision-language video-question-answering
Language:Python 29
XLiu443 / Tem-adapter
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
clip-model video-question-answering video-understanding
Language:Python 27
yl3800 / IGV
This repo contains code for Invariant Grounding for Video Question Answering
generalization interpretable invariant-learning video video-question-answering videoqa cvpr-2022 cvpr-oral-2022
Language:Python 26
noagarcia / ROLL-VideoQA
PyTorch code for ROLL, a knowledge-based video story question answering model.
knowledge-based-reasoning video-question-answering video-understanding visual-question-answering
Language:Python 19
doc-doc / CoVGT
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
contrastive-learning dynamic-visual-graph video-language-understanding videoqa video-question-answering
Language:Python 16
mlvlab / OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
iccv2023 multi-modal video-question-answering visual-question-answering
Language:Python 14
mmazab / LifeQA
Data and PyTorch code for the LifeQA LREC 2020 paper.
dataset pytorch videoqa lifeqa video-question-answering question-answering computer-vision natural-language-processing videos youtube real-life research lrec2020 lrec machine-learning nlp deep-learning
Language:Python 10
liveseongho / DramaQA
DramaQA Starter Code (2021)
video-question-answering multi-modal-learning vision-language pytorch
Language:Python 9
declare-lab / Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
multimodality video-understanding video-question-answering visual-language-models naacl2024
Language:Python 6
Abdelrhman-Yasser / multimedia_question_answering
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
nlp video-question-answering deep-learning tensorflow python python3
Language:Python 4
MichiganNLP / lifeqa
LifeQA website code
lifeqa video-question-answering question-answering computer-vision natural-language-processing videos youtube real-life research lrec2020 lrec machine-learning deep-learning videoqa nlp dataset pytorch
Language:HTML 4
lyuchenyang / Semantic-aware-VideoQA
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering
Language:Python 2