Esmail-ibraheem / Stable-Diffusion-Pytorch

🎨"Denoising Diffusion Probabilistic Models" paper implementation. a stable diffusion engine: using pytorch as a backend and fastAPI as frontend using javascript, and slo providing gradio interface

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dali/DDPM

Pasted image 20240816183120

Diffusion Models — Introduction

Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. Fundamentally, Diffusion Models work by destroying training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generate data by simply passing randomly sampled noise through the learned denoising process. image

Diffusion models are inspired by non-equilibrium thermodynamics. They define a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise. Unlike VAE or flow models, diffusion models are learned with a fixed procedure and the latent variable has high dimensionality (same as the original data).

Now get deeper into the diffusion models: diffusion models consists of two processes as shown in the image below:

  • Forward process (with red lines).
  • Reverse process (with blue lines).

Usage

you have two different ways to run with interface:

  • using gradio: run this command python sd_gradio.py Pasted image 20240818155133
  • using fastapi: enter the directory 'stable_diffusion_api', then run this command uvicorn sd_api:app --reload assuming that you already downloaded the checkpoints, and you are in the directory. in the fastapi option, you have like an engine, creating your project, select your hardware, and then you have the generate image page

image image

Note

Download vocab.json and merges.txt from huggingface/stable_diffusion.tokenizer and Download v1-5-pruned-emaonly.ckpt from huggingface/stable_diffusion.checpoints and save it in the data folder

image

Results:

output d output_image s test
Prompt: A cat stretching on the floor, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution Prompt: A dog reading a book, wearing glasses, comfy hat, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution Prompt: A dog stretching on the floor wearing sunglasses, looking to the camera, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution Prompt: An astronaut on the moon, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution Prompt: A black dog sitting between a bush and a pair of green pants standing up with nobody inside them, highly detailed, ultra sharp, cinematic, 100mm lens, 8k resolution

@misc{ho2020denoising,
    title   = {Denoising Diffusion Probabilistic Models},
    author  = {Jonathan Ho and Ajay Jain and Pieter Abbeel},
    year    = {2020},
    eprint  = {2006.11239},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{https://doi.org/10.48550/arxiv.2204.11824,
  doi = {10.48550/ARXIV.2204.11824},
  url = {https://arxiv.org/abs/2204.11824},
  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Retrieval-Augmented Diffusion Models},
  publisher = {arXiv},
  year = {2022},  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

About

🎨"Denoising Diffusion Probabilistic Models" paper implementation. a stable diffusion engine: using pytorch as a backend and fastAPI as frontend using javascript, and slo providing gradio interface

License:MIT License


Languages

Language:Python 88.1%Language:Jupyter Notebook 7.8%Language:HTML 2.7%Language:JavaScript 1.4%