Zero-to-Hero - Diffusion Models

Repository of lessons exploring image diffusion models, focused on understanding and education.

Introduction

This series is heavily inspired by Andrej Karpathy's Zero to Hero series of videos. Well, actually, we are straight out copying that series, because they are so good. Seriously, if you haven't followed his videos, go do that now - lot's of great stuff in there!

Each lesson contains both an explanatory video which walks you through the lesson and the code, a colab notebook that corresponds to the video material, and a a pointer to the runnable code in github. All of the code is designed to run on a minimal GPU. We test everything on T4 instances, since that is what colab provides at the free tier, and they are cheap to run on AWS as stand alone instances. Theoretically each of the lessons should be runnable on any 8GB or greater GPU, as they are all designed to be trained in real time on minimal hardware, so that we can really dive into the code.

Each lesson is in its own subdirectory, and we have ordered the lessons in historical order (from oldest to latest) so that its easy to trace the development of the research and see the historical progress of this space.

Since every lesson is meant to be trained in real time with minimal cost, most of the lessons are restricted to training on the MNIST dataset, simply because it is quick to train and easy to visualize.

Requirements for All Lessons

All lessons are built using PyTorch and written in Python 3. To setup an environment to run all of the lessons, we suggest using conda or venv:

> python3 -m venv mindiffusion_env
> source mindiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt

All lessons are designed to be run in the lesson directory, not the root of the repository.

Table of Lessons

Lesson	Date	Name	Title	Colab	Code
1			Introduction to Diffusion Models	colab
2	March 2015	DPM	Deep Unsupervised Learning using Nonequilibrium Thermodynamics	colab	code
3	July 2019	NCSN	Generative Modeling by Estimating Gradients of the Data Distribution		code
4	June 2020	NCSNv2	Improved Techniques for Training Score-Based Generative Models		code
5	June 2020	DDPM	Denoising Diffusion Probabilistic Models		code
5a			DDPM with Dropout		code
5b			Interpolation in Latent Space		code
5c			Adding Control - Basic Class Conditioning with Cross-Attention		code
5d			Adding Control - Extended Class Conditioning		code
5e			Adding Control - Text-to-Image		code
6	October 2020	DDIM	Denoising Diffusion Implicit Models		code
7	November 2020	Score SDE	Score-Based Generative Modeling through Stochastic Differential Equations		code
8	February 2021	DaLL-E	Zero-Shot Text-to-Image Generation		code
9	February 2021	IDDPM	Improved Denoising Diffusion Probabilistic Models		code
10	April 2021	SR3	Image Super-Resolution via Iterative Refinement		code
11	May 2021	Guided Diffusion	Diffusion Models Beat GANs on Image Synthesis		code
12	May 2021	CDM	Cascaded Diffusion Models for High Fidelity Image Generation		code
13	July 2021	VDM	Variational Diffusion Models		code
14	December 2021	Latent Diffusion	High-Resolution Image Synthesis with Latent Diffusion Models		code
14a		Stable Diffusion v1
14b		Stable Diffusion v2
15	December 2021	CFG	Classifier-Free Diffusion Guidance		code
16	December 2021	GLIDE	GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models		code
17	February 2022		Progressive Distillation for Fast Sampling of Diffusion Models		code
18	April 2022	DaLL-E 2	Hierarchical Text-Conditional Image Generation with CLIP Latents		code
19	May 2022	Imagen	Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding		code
20	October 2022		Flow Matching for Generative Modeling
21	October 2022	ERNIE-ViLG 2.0	ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
22	December 2022	DiT	Scalable Diffusion Models with Transformers		code
23	January 2023	Simple Diffusion	Simple diffusion: End-to-end diffusion for high resolution images
24	February 2023	ControlNet	Adding Conditional Control to Text-to-Image Diffusion Models
25	May 2023	RAPHAEL	RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
26	June 2023	Wuerstchen	Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
27	July 2023	SDXL	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
28	September 2023	PixArt-α	PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis		code
29	October 2023	DaLL-E 3	Improving Image Generation with Better Captions
30	January 2024	PIXART-δ	PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
31	March 2024	Stable Diffusion 3	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
32	March 2024	PixArt-Σ	PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Lessons to Add

Emu (abstract)
CogView (abstract)
CogView 2 (abstract)
CogView 3 (abstract)
Consistency Models (abstract)
Latent Consistency Models (abstract)
Scalable Diffusion Models with State Space Backbone (abstract)
Palette: Image-to-Image Diffusion Models (abstract)

Resources

Most of the implementations have been consolidated into a single image diffusion repository, which is configurable through YAML files.

If you are interested Video Diffusion Models, take a look through video diffusion models where we are adding all of the latest video diffusion model paper implementations, on an equivalent MNIST dataset for video.

About

Repository of lessons exploring image diffusion models, focused on understanding and education.

MIT License

Languages

Language:Python 99.2%Language:Cuda 0.7%Language:C++ 0.1%