Video-ChatGPT 🎥 💬

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Installation 🔧

We recommend setting up a conda environment for the project:

conda create --name=video_chatgpt python=3.10
conda activate video_chatgpt

git clone https://github.com/mbzuai-oryx/Video-ChatGPT.git
cd Video-ChatGPT
pip install -r requirements.txt

export PYTHONPATH="./:$PYTHONPATH"

Additionally, install FlashAttention for training,

pip install ninja

git clone https://github.com/HazyResearch/flash-attention.git
cd flash-attention
git checkout v1.0.7
python setup.py install

Running Demo Offline 💿

To run the demo offline, please refer to the instructions in offline_demo.md.

Training 🚋

For training instructions, check out train_video_chatgpt.md.

Video Instruction Dataset for ADL:

If you want the dataset and features let me know.

Qualitative Analysis 🔍

A Comprehensive Evaluation of Video-ChatGPT's Performance across Multiple Tasks.

Video Reasoning Tasks 🎥

Creative and Generative Tasks 🖌️

Spatial Understanding 🌐

Video Understanding and Conversational Tasks 💬

Action Recognition 🏃

Question Answering Tasks ❓

Temporal Understanding ⏳

License 📜

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

About

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Creative Commons Attribution 4.0 International

Languages

Language:Python 99.4%Language:Shell 0.6%