stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline
tofulim opened this issue · comments
hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate
environment
- Ubuntu 18.04
- T4
- torch == 1.11.0+cu113
- optimum == 1.4.0
- onnx == 1.12.0
- Python 3.8.10
- triton 22.01
i ported stabilityai/stable-diffusion-2-1-base
with convert_stable_diffusion_checkpoint_to_onnx.py
and used your model directory with fixing some pbtxt dimensions
and add noise_pred = noise_pred.to("cuda")
this line at link
and triton server worked like below
then i inference with this prompts
prompts = [
"A man standing with a red umbrella",
"A child standing with a green umbrella",
"A woman standing with a yellow umbrella"
]
and i get response after 6.8 sec (avg of 3 inferences)
strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)
is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..