Haochen Wang's starred repositories
Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
matryoshka-mm
Matryoshka Multimodal Models
subobjects
Official repository of paper "Subobject-level Image Tokenization"
LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
enhancing-transformers
An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch
VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Semantic-SAM
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
taming-transformers
Taming Transformers for High-Resolution Image Synthesis
magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch