how about using modelmesh to serve thousands of stable diffusion models
Jack47 opened this issue · comments
Jack Chen commented
I want to use modelmesh to serving thousands of stable diffusions models. Any advice would be appreciated~
- I'm using triton as serving runtime. Inference time is about 3~10s.
- I'm using ensemble in triton to leverage business logics like audit and watermark, maybe they can be standalone service in the future
- currently every model have it's own k8s service and ingress rules.
Goals:
- make higher cluster resources utilization, especially GPU
- make every inference request latency as fast as possible
Christian Kadner commented
@Jack47 -- were you able to use ModelMesh-Serving for your stable diffusion models? Did you run into any specific issues?
WikiPedia thinks it should look like this :-)
Jack Chen commented
currently we don't use modelmesh, thanks for your response.