Yongming Rao's starred repositories
Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
ShareGPT4Video
An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
Chain-of-Spot
Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models
CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
LLaMA2-Accessory
An Open-source Toolkit for LLM Development