kamalkraj / stable-diffusion-tritonserver

Deploy stable diffusion model with onnx/tenorrt + tritonserver

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline

tofulim opened this issue · comments

hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate

environment

  • Ubuntu 18.04
  • T4
  • torch == 1.11.0+cu113
  • optimum == 1.4.0
  • onnx == 1.12.0
  • Python 3.8.10
  • triton 22.01

i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions

and add noise_pred = noise_pred.to("cuda") this line at link

and triton server worked like below
image

then i inference with this prompts

prompts = [
    "A man standing with a red umbrella",
    "A child standing with a green umbrella",
    "A woman standing with a yellow umbrella"
]

and i get response after 6.8 sec (avg of 3 inferences)

strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)

is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..