Optimized runtime inference

Question

Optimized runtime inference

jerrymatjila opened this issue 8 months ago · comments

I'm looking for advice. Based on your experience which engine provides better optimized runtime inference between vllm and TensorRT-LLM or any engine you have encountered for running on NVIDIA GPU.

stale · Answer 1 · Mon May 13 2024 06:15:36 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.