There are 5 repositories under video-question-answering topic.
[CVPR2024][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
A PyTorch implementation of VIOLET
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.
A PyTorch implementation of EmpiricalMVM
A new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries.
ROCK model for Knowledge-Based VQA in Videos
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
PyTorch code for ROLL, a knowledge-based video story question answering model.
DramaQA Starter Code (2021)
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
LifeQA website code
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
[ICCV 2021] On the hidden treasure of dialog in video question answering
Multi-Scale Progressive Attention Network for Video Question Answering
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
Part of my work for my Bachelor's Thesis Project on Counterfactual Reasoning for Videos.