Repository of lessons exploring image diffusion models, focused on understanding and education.
This series is heavily inspired by Andrej Karpathy's Zero to Hero series of videos. Well, actually, we are straight out copying that series, because they are so good. Seriously, if you haven't followed his videos, go do that now - lot's of great stuff in there!
Each lesson contains both an explanatory video which walks you through the lesson and the code, a colab notebook that corresponds to the video material, and a a pointer to the runnable code in github. All of the code is designed to run on a minimal GPU. We test everything on T4 instances, since that is what colab provides at the free tier, and they are cheap to run on AWS as stand alone instances. Theoretically each of the lessons should be runnable on any 8GB or greater GPU, as they are all designed to be trained in real time on minimal hardware, so that we can really dive into the code.
Each lesson is in its own subdirectory, and we have ordered the lessons in historical order (from oldest to latest) so that its easy to trace the development of the research and see the historical progress of this space.
Since every lesson is meant to be trained in real time with minimal cost, most of the lessons are restricted to training on the MNIST dataset, simply because it is quick to train and easy to visualize.
All lessons are built using PyTorch and written in Python 3. To setup an environment to run all of the lessons, we suggest using conda or venv:
> python3 -m venv mindiffusion_env
> source mindiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt
All lessons are designed to be run in the lesson directory, not the root of the repository.
- Emu (abstract)
- CogView (abstract)
- CogView 2 (abstract)
- CogView 3 (abstract)
- Consistency Models (abstract)
- Latent Consistency Models (abstract)
- Scalable Diffusion Models with State Space Backbone (abstract)
- Palette: Image-to-Image Diffusion Models (abstract)
Most of the implementations have been consolidated into a single image diffusion repository, which is configurable through YAML files.
If you are interested Video Diffusion Models, take a look through video diffusion models where we are adding all of the latest video diffusion model paper implementations, on an equivalent MNIST dataset for video.