Beast code in Giters

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Language:PythonBSD-3-Clause20000

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonNOASSERTION257800

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT280500

RADIO

Official repository for "AM-RADIO: Reduce All Domains Into One"

Language:PythonNOASSERTION49500

HPT

HPT - Open Multimodal LLMs from HyperGAI

Language:PythonApache-2.030100

G-LLaVA

Official github repo of G-LLaVA

Language:Python10500

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0306600

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT376100

raoyongming

Yongming Rao's starred repositories

cambrian

MM-NIAH

PLLaVA

Lumina-T2X

ShareGPT4Video

Omost

flash-linear-attention

LaViLa

MiniCPM-V

HallusionBench

HunyuanDiT

LLaVA-NeXT

DeepSeek-V2

RADIO

HPT

G-LLaVA

MGM

VAR

FeatUp

SpeeD

LLaVA-RLHF

dust3r

VILA

Chain-of-Spot

Open-Sora

CapsFusion

grok-1

VQASynth

DeepSeek-VL

LLaMA2-Accessory