ZhendongWang6

followers

following

stars

University of Science and Technology of China (USTC)

Hefei, China

https://zhendongwang6.github.io/

Zhendong Wang's starred repositories

diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Language:PythonMIT31800

Kolors

Kolors Team

Language:PythonApache-2.0239800

Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

MIT199300

FastV

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:Python16900

EG4D

Official implementation of EG4D: Explicit Generation of 4D Object without Score Distillation

1700

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT189600

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT297700

InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

Language:PythonApache-2.043600

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter Notebook160100

DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Language:PythonApache-2.018300

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT384700

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.0149800

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonApache-2.0134400

GaussianCube

GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling

Language:Python26700

ControlNet-XS

Language:PythonApache-2.041700

img2img-turbo

One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more

Language:PythonMIT133000

grok-1

Grok open release

Language:PythonApache-2.04917900

RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Language:Python69300

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonApache-2.0565200

sd-forge-layerdiffuse

[WIP] Layer Diffusion for WebUI (via Forge)

Language:PythonApache-2.0366200

Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

MIT76400

FiT

[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model

Apache-2.034300

fast-DiT

Fast Diffusion Models with Transformers

Language:PythonNOASSERTION62200

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION569200

fastcomposer

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Language:PythonMIT62600

minSDXL

Huggingface-compatible SDXL Unet implementation that is readily hackable

Language:Jupyter Notebook36600

weak-to-strong

Language:PythonMIT245700

LVM

Language:PythonApache-2.0170400

MaskTextSpotterV3

The code of "Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting"

Language:PythonNOASSERTION61800

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonMIT345300