Zero-to-Hero - Diffusion Models

Repository of lessons exploring image diffusion models, focused on understanding and education.

Introduction

This series is heavily inspired by Andrej Karpathy's Zero to Hero series of videos. Well, actually, we are straight out copying that series, because they are so good. Seriously, if you haven't followed his videos, go do that now - lot's of great stuff in there!

Each lesson contains both an explanatory video which walks you through the lesson and the code, a colab notebook that corresponds to the video material, and a a pointer to the runnable code in github. All of the code is designed to run on a minimal GPU. We test everything on T4 instances, since that is what colab provides at the free tier, and they are cheap to run on AWS as stand alone instances. Theoretically each of the lessons should be runnable on any 8GB or greater GPU, as they are all designed to be trained in real time on minimal hardware, so that we can really dive into the code.

Each lesson is in its own subdirectory, and we have ordered the lessons in historical order (from oldest to latest) so that its easy to trace the development of the research and see the historical progress of this space.

Since every lesson is meant to be trained in real time with minimal cost, most of the lessons are restricted to training on the MNIST dataset, simply because it is quick to train and easy to visualize.

Requirements for All Lessons

All lessons are built using PyTorch and written in Python 3. To setup an environment to run all of the lessons, we suggest using conda or venv:

> python3 -m venv mindiffusion_env
> source mindiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt

All lessons are designed to be run in the lesson directory, not the root of the repository.

Table of Lessons

Lesson	Date	Name	Title	Colab	Code
1			Introduction to Diffusion Models	colab
2	March 2015	DPM	Deep Unsupervised Learning using Nonequilibrium Thermodynamics	colab	code
3	July 2019	NCSN	Generative Modeling by Estimating Gradients of the Data Distribution		code
4	June 2020	NCSNv2	Improved Techniques for Training Score-Based Generative Models		code
5	June 2020	DDPM	Denoising Diffusion Probabilistic Models		code
5a			DDPM with Dropout		code
5b			Interpolation in Latent Space		code
5c			Adding Control - Basic Class Conditioning with Cross-Attention		code
5d			Adding Control - Extended Class Conditioning		code
5e			Adding Control - Text-to-Image		code
6	October 2020	DDIM	Denoising Diffusion Implicit Models		code
7	November 2020	Score SDE	Score-Based Generative Modeling through Stochastic Differential Equations		code
8	February 2021	DaLL-E	Zero-Shot Text-to-Image Generation		code
9	February 2021	IDDPM	Improved Denoising Diffusion Probabilistic Models		code
10	April 2021	SR3	Image Super-Resolution via Iterative Refinement		code
11	May 2021	Guided Diffusion	Diffusion Models Beat GANs on Image Synthesis		code
12	May 2021	CDM	Cascaded Diffusion Models for High Fidelity Image Generation		code
13	December 2021	Latent Diffusion	High-Resolution Image Synthesis with Latent Diffusion Models		code
13a		Stable Diffusion v1
13b		Stable Diffusion v2
14	December 2021	CFG	Classifier-Free Diffusion Guidance		code
15	December 2021	GLIDE	GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models		code
16	February 2022		Progressive Distillation for Fast Sampling of Diffusion Models
17	April 2022	DaLL-E 2	Hierarchical Text-Conditional Image Generation with CLIP Latents		code
18	May 2022	Imagen	Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding		code
19	October 2022		Flow Matching for Generative Modeling
20	October 2022	ERNIE-ViLG 2.0	ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts
21	December 2022	DiT	Scalable Diffusion Models with Transformers		code
22	January 2023	Simple Diffusion	Simple diffusion: End-to-end diffusion for high resolution images
23	February 2023	ControlNet	Adding Conditional Control to Text-to-Image Diffusion Models
24	May 2023	RAPHAEL	RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
25	June 2023	Wuerstchen	Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
26	July 2023	SDXL	SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
27	September 2023	PixArt-α	PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis		code
28	October 2023	DaLL-E 3	Improving Image Generation with Better Captions
29	January 2024	PIXART-δ	PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
30	March 2024	Stable Diffusion 3	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
31	March 2024	PixArt-Σ	PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

About

Repository of lessons exploring image diffusion models, focused on understanding and education.

Languages

Language:Python 99.0%Language:Cuda 0.9%Language:C++ 0.1%