Yuanhan Zhang's starred repositories
Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
ring-flash-attention
Ring attention implementation with flash attention
ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
scaling_on_scales
When do we not need larger vision models?
video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
HD-VG-130M
The HD-VG-130M Dataset
LongVideoBench
Official Dataloader and Evaluation Scripts for LongVideoBench.
MMLongBench-Doc
Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
CVRR-Evaluation-Suite
Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".