kserve / modelmesh

Distributed Model Serving Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how about using modelmesh to serve thousands of stable diffusion models

Jack47 opened this issue · comments

I want to use modelmesh to serving thousands of stable diffusions models. Any advice would be appreciated~

  1. I'm using triton as serving runtime. Inference time is about 3~10s.
  2. I'm using ensemble in triton to leverage business logics like audit and watermark, maybe they can be standalone service in the future
  3. currently every model have it's own k8s service and ingress rules.

Goals:

  1. make higher cluster resources utilization, especially GPU
  2. make every inference request latency as fast as possible

@Jack47 -- were you able to use ModelMesh-Serving for your stable diffusion models? Did you run into any specific issues?

WikiPedia thinks it should look like this :-)

image

currently we don't use modelmesh, thanks for your response.