OpenGVLab / VideoChat-R1

[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning

Repository from Github https://github.comOpenGVLab/VideoChat-R1Repository from Github https://github.comOpenGVLab/VideoChat-R1

πŸ”₯ Updates

  • 2025/04/22:πŸ”₯πŸ”₯πŸ”₯ We release our VideoChat-R1-caption at Huggingface.
  • 2025/04/14:πŸ”₯πŸ”₯πŸ”₯ We release our VideoChat-R1 and VideoChat-R1-thinking at Huggingface.
  • 2025/04/10:πŸ”₯πŸ”₯πŸ”₯ We release our paper and code.

🦜 Introduction

alt text

Demo & Inference

Refer to hf README to inference our model.

Evaluation

See eval_scripts and lmms-eval_videochat.

Training

See training_scripts.

πŸ“„ Citation

If you find this project useful in your research, please consider cite:

@article{li2025videochatr1,
  title={VideoChat-R1: Enhancing Spatio-Temporal
Perception via Reinforcement Fine-Tuning},
  author={Li, Xinhao and Yan, Ziang and Meng, Desen and Dong, Lu and Zeng, Xiangyu and He, Yinan and Wang, Yali and Qiao, Yu and Wang, Yi and Wang, Limin},
  journal={arXiv preprint arXiv:2504.06958},
  year={2025}
}

About

[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning


Languages

Language:Python 87.3%Language:Jupyter Notebook 11.0%Language:Shell 1.8%