Gary-code / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Large Video Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome-LLMs-for-Video-Understanding Awesome

Table of Contents


Video Understanding

Title Date Code Data Venue
Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding Star 06/2023 code -
LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Star 06/2023 code -
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Star 04/2023 code -
Garbage in, garbage out: Zero-shot detection of crime using Large Language Models Star 07/2023 code -
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation 07/2023 code - -
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension 07/2023 code - -
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models 06/2023 code -
VALLEY: Video Assistant with Large Language model Enhanced abilitY 06/2023 code -
Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration 06/2023 code -
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models 06/2023 - -
MIMIC-IT: Multi-Modal In-Context Instruction Tuning 06/2023 code -
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks 06/2023 code - -
FunQA: Towards Surprising Video Comprehension 06/2023 code - -
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset 05/2023 code - -
VideoChat: Chat-Centric Video Understanding 05/2023 code demo
VideoLLM: Modeling Video Sequence with Large Language Models 05/2023 code -
Self-Chained Image-Language Model for Video Localization and Question Answering 05/2023 code -
A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot 05/2023 - -
Let's Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction 05/2023 - -
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering 04/2023 - -
VLog: Video as a Long Document 04/2023 demo -
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions 04/2023 code -
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System 04/2023 project page -
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos 03/2023 code -
Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering 03/2023 code -
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 02/2023 code -
Learning Video Representations from Large Language Models 12/2022 code -
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners 05/2022 code -
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language 04/2022 project page -

Video Generation

Title Date Code Data Venue
NExT-GPT: Any-to-Any Multimodal LLM Star 09/2023 code -
Generative Pretraining in Multimodality Star 07/2023 code -

Dataset

Title Date Code Data Venue
VidChapters-7M: Video Chapters at Scale Star 09/2023 code -
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Star 07/2023 code -
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset Star 04/2023 code -
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Star 06/2023 code -
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset Star 05/2023 code -

Evaluation

Title Date Code Data Venue
Let’s Think Frame by Frame: Evaluating Video Chain of Thought with Video Infilling and Prediction Star 05/2023 code -
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension Star 07/2023 code -
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Star 07/2023 code -
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation Star 11/2023 code -
VLM-Eval: A General Evaluation on Video Large Language Models 11/2023 - -

About

🔥🔥🔥Latest Papers, Codes and Datasets on Large Video Models