ddpm generative-model machine-learning paper-implementations pytorch

Dali/DDPM

Diffusion Models — Introduction

Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. Fundamentally, Diffusion Models work by destroying training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generate data by simply passing randomly sampled noise through the learned denoising process.

Diffusion models are inspired by non-equilibrium thermodynamics. They define a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise. Unlike VAE or flow models, diffusion models are learned with a fixed procedure and the latent variable has high dimensionality (same as the original data).

Now get deeper into the diffusion models: diffusion models consists of two processes as shown in the image below:

Forward process (with red lines).
Reverse process (with blue lines).

Usage

you have two different ways to run with interface:

using gradio: run this command python sd_gradio.py
using fastapi: enter the directory 'stable_diffusion_api', then run this command uvicorn sd_api:app --reload assuming that you already downloaded the checkpoints, and you are in the directory. in the fastapi option, you have like an engine, creating your project, select your hardware, and then you have the generate image page

Note

Download vocab.json and merges.txt from huggingface/stable_diffusion.tokenizer and Download v1-5-pruned-emaonly.ckpt from huggingface/stable_diffusion.checpoints and save it in the data folder

Results:


Prompt: A cat stretching on the floor, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution	Prompt: A dog reading a book, wearing glasses, comfy hat, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution	Prompt: A dog stretching on the floor wearing sunglasses, looking to the camera, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution	Prompt: An astronaut on the moon, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution	Prompt: A black dog sitting between a bush and a pair of green pants standing up with nobody inside them, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution

@misc{ho2020denoising,
    title   = {Denoising Diffusion Probabilistic Models},
    author  = {Jonathan Ho and Ajay Jain and Pieter Abbeel},
    year    = {2020},
    eprint  = {2006.11239},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{https://doi.org/10.48550/arxiv.2204.11824,
  doi = {10.48550/ARXIV.2204.11824},
  url = {https://arxiv.org/abs/2204.11824},
  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Retrieval-Augmented Diffusion Models},
  publisher = {arXiv},
  year = {2022},  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

About

🎨"Denoising Diffusion Probabilistic Models" paper implementation. a stable diffusion engine: using pytorch as a backend and fastAPI as frontend using javascript, and slo providing gradio interface

ddpm generative-model machine-learning paper-implementations pytorch

MIT License

Languages

Language:Python 88.1%Language:Jupyter Notebook 7.8%Language:HTML 2.7%Language:JavaScript 1.4%