triton-inference-server / fastertransformer_backend

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does triton-inference-server only support slurm for multi-node deployment?

Shuai-Xie opened this issue · comments

Dear Developers:

I'm deploying a GPT model with triton-inference-server and fastertransformer_backend, following this tutorial: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-triton-server-on-multiple-nodes.

I have successfully implemented the single-node deployment and conducted identity testing. However, as I moved forward, I discovered that multi-node serving requires Slurm, which may be a counterpart to multi-node training.

So, my question is what is the right way to use triton-inference-server on a cluster?

Thanks a lot!

I don't know what is the right way for a cluster. You can ask in tritonserver repo.

All platform supported by tritonserver should be supported by ft backend, except we need some method to do multi-process for multi node inference, which may be not covered by tritonserver directly.

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

I noticed that triton-inference-server/server#5627 was opened in tritonserver repo and as @krishung5 suggested, any multi-node or fastertransformer specific questions should be asked here.

Is it possible to add examples about how to use fastertransformer backend if tritonserver has been deployed through helm chart in Kubernetes cluster for multi-node inference?