Yongming Rao's starred repositories

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1144Issues:0Issues:0

MM-NIAH

This is the official implementation of the paper "Needle In A Multimodal Haystack"

Language:PythonStargazers:47Issues:0Issues:0

PLLaVA

Official repository for the paper PLLaVA

Language:PythonStargazers:456Issues:0Issues:0

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonLicense:MITStargazers:1805Issues:0Issues:0

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Language:PythonStargazers:1095Issues:0Issues:0

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:6696Issues:0Issues:0

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonLicense:MITStargazers:693Issues:0Issues:0

LaViLa

Code release for "Learning Video Representations from Large Language Models"

Language:PythonLicense:MITStargazers:457Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7696Issues:0Issues:0

HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Language:PythonLicense:BSD-3-ClauseStargazers:200Issues:0Issues:0

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonLicense:NOASSERTIONStargazers:2578Issues:0Issues:0
Language:PythonStargazers:969Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:2805Issues:0Issues:0

RADIO

Official repository for "AM-RADIO: Reduce All Domains Into One"

Language:PythonLicense:NOASSERTIONStargazers:495Issues:0Issues:0

HPT

HPT - Open Multimodal LLMs from HyperGAI

Language:PythonLicense:Apache-2.0Stargazers:301Issues:0Issues:0

G-LLaVA

Official github repo of G-LLaVA

Language:PythonStargazers:105Issues:0Issues:0

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3066Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3761Issues:0Issues:0

FeatUp

Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024

Language:Jupyter NotebookLicense:MITStargazers:1278Issues:0Issues:0

SpeeD

SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Language:PythonLicense:Apache-2.0Stargazers:121Issues:0Issues:0

LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

Language:PythonLicense:GPL-3.0Stargazers:270Issues:0Issues:0

dust3r

DUSt3R: Geometric 3D Vision Made Easy

Language:PythonLicense:NOASSERTIONStargazers:4620Issues:0Issues:0

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:865Issues:0Issues:0

Chain-of-Spot

Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models

Language:PythonLicense:Apache-2.0Stargazers:78Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:19906Issues:0Issues:0

CapsFusion

[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale

Language:PythonStargazers:179Issues:0Issues:0

grok-1

Grok open release

Language:PythonLicense:Apache-2.0Stargazers:49124Issues:0Issues:0

VQASynth

Compose multimodal datasets 🎹

Language:PythonStargazers:115Issues:0Issues:0

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonLicense:MITStargazers:1842Issues:0Issues:0

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2603Issues:0Issues:0