Optimized runtime inference
jerrymatjila opened this issue · comments
Jerry Matjila commented
I'm looking for advice. Based on your experience which engine provides better optimized runtime inference between vllm and TensorRT-LLM or any engine you have encountered for running on NVIDIA GPU.
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.