Haochen-Wang409

followers

following

stars

CASIA, UCAS

Beijing, China

haochen-wang409.github.io

Haochen Wang's starred repositories

VLoRA

[NeurIPS 2024] Visual Perception by Large Language Model’s Weights

Language:PythonApache-2.0900

Open-O1

Language:PythonApache-2.068000

LLaVA-NeXT

Language:PythonApache-2.0267600

Lumina-mGPT

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Language:Python48200

DIVA

Diffusion Feedback Helps CLIP See Better

Language:PythonMIT21000

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.01508400

mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Language:PythonMIT89400

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02562200

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookMIT1170600

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonApache-2.065700

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

Language:PythonApache-2.0119800

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonNOASSERTION156300

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT125900

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.0777200

matryoshka-mm

Matryoshka Multimodal Models

Language:PythonApache-2.07700

subobjects

Official repository of paper "Subobject-level Image Tokenization"

Language:Python6100

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

EVA

EVA Series: Visual Representation Fantasies from BAAI

Language:PythonMIT226800

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2672800

enhancing-transformers

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Language:PythonMIT28300

VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT415100

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01980500

DreamLLM

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

Language:PythonApache-2.038500

SEED

Official implementation of SEED-LLaMA (ICLR 2024).

Language:PythonNOASSERTION57300

Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Language:Python230300

Maskgit-pytorch

Language:Jupyter NotebookMIT15600

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookMIT575300

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonMIT55400

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonMIT251500

DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Language:Python37500