yujichai / x-stable-diffusion

Real-time inference for Stable Diffusion

Home Page:https://stochastic.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stochastic.ai Stochastic.ai


⚡️ Real-time inference for Stable Diffusion

Stochastic.ai Demo

Stochastic provides this repository to perform fast real-time inference with Stable Diffusion. The goal of this repo is to collect and document as many optimization techniques for Stable Diffusion as possible.

Currently this repository includes 4 optimization techiques with more in the pipeline. Feel free to open a PR to submit a new optimization technique to the folder.

Table of contents:

🔥 Optimizations

Benchmark result

Here are some benchmark results on 1x40GB A100 gpu, cuda11.6:

All benchmarks are doing by averaging 50 iterations run:

Running args {'max_seq_length': 64, 'num_inference_steps':50, 'image_size':(512,512)}

Throughput in sec on 1x40GB A100 gpu - batch size = 1:

project Latency (s) GPU VRAM (GB)
PyTorch fp16 5.77 10.3
nvFuser fp16 3.15 ---
FlashAttention fp16 2.80 7.5
TensorRT fp16 1.68 8.1
AITemplate fp16 1.38 4.83

Batched Version

Result on different batch size:

Running args {'max_seq_length': 64, 'num_inference_steps':50, 'image_size':(512,512)}
project \ bs 1 4 8 16 24
Pytorch fp16 5.77s/10.3GB 19.2s/18.5GB 36s/26.7GB OOM
AITemplate fp16 1.42s/4.83GB 4.25s/8.5GB 7.4s/14.5GB 15.7s/25GB 23.4s/36GB
TensorRT fp16 1.68s/8.1GB OOM
FlashAttention fp16 2.8s/7.5GB 9.1s/17GB 17.7s/29.5GB OOM

Note: TensorRT is out of memory in the covertion stage which convert Unet model from Onnx to TensorRT.

🚀 Quickstart

Make sure you have Python and Docker installed on your system

  1. Install the latest version of stochasticx library.
pip install stochasticx
  1. Deploy the Stable Diffusion model
stochasticx stable-diffusion deploy --type aitemplate

If you don't have a Stochastic account, then the CLI will prompt you to quickly create one. It is free and just takes 1 minute Sign up →

Alternatively, you can deploy stable diffusion without our CLI by checking the steps here.

To infer with this deployed model:

stochasticx stable-diffusion infer --prompt "Riding a horse"

Check all the options of the infer command:

stochasticx stable-diffusion infer --help

You can get the logs of the deployment executing the following command:

stochasticx stable-diffusion logs

Stop and remove the deployment with this command:

stochasticx stable-diffusion stop

Manual deployment

Check the README.md of the following directories:

  • AITemplate
  • FlashAttention
  • nvFuser
  • PyTorch
  • TensorRT

✅ Stochastic

Stochastic was founded with a vision to make deep learning optimization and deployment effortless. We make it easy to ship state-of-the-art AI models with production-grade performance.

Features

  • Auto-optimization of deep learning models
  • Benchmarking of models and hardware on different evaluation metrics
  • Auto-scaling hosted and on-prem accelerated inference for models like BLOOM 176B, Stable Diffusion, GPT-J Enquire →
  • Support for AWS, GCP, Azure clouds and Kubernetes clusters

Reference

About

Real-time inference for Stable Diffusion

https://stochastic.ai

License:Apache License 2.0


Languages

Language:Python 99.4%Language:Dockerfile 0.4%Language:Makefile 0.2%