Does triton-inference-server only support slurm for multi-node deployment?

Question

Does triton-inference-server only support slurm for multi-node deployment?

Shuai-Xie opened this issue a year ago · comments

Dear Developers:

I'm deploying a GPT model with triton-inference-server and fastertransformer_backend, following this tutorial: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-triton-server-on-multiple-nodes.

I have successfully implemented the single-node deployment and conducted identity testing. However, as I moved forward, I discovered that multi-node serving requires Slurm, which may be a counterpart to multi-node training.

So, my question is what is the right way to use triton-inference-server on a cluster?

Maybe KServe with triton on a k8s cluster?
https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/triton/bert/README.md
or something else. (Looking forward to discussing with you sincerely)

Thanks a lot!

byshiue_NV · Answer 1 · Mon Apr 10 2023 12:38:09 GMT+0800 (China Standard Time)

I don't know what is the right way for a cluster. You can ask in tritonserver repo.

All platform supported by tritonserver should be supported by ft backend, except we need some method to do multi-process for multi node inference, which may be not covered by tritonserver directly.

Shuai Xie · Answer 2 · Wed Apr 12 2023 17:21:41 GMT+0800 (China Standard Time)

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

R0CKSTAR · Answer 3 · Mon Apr 24 2023 16:33:09 GMT+0800 (China Standard Time)

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

I noticed that triton-inference-server/server#5627 was opened in tritonserver repo and as @krishung5 suggested, any multi-node or fastertransformer specific questions should be asked here.

Is it possible to add examples about how to use fastertransformer backend if tritonserver has been deployed through helm chart in Kubernetes cluster for multi-node inference?