The inference generation is very slow

Question

The inference generation is very slow

Alkaiddd opened this issue 5 months ago · comments

The inference process is currently quite slow. Are there any methods available to accelerate it?
For action task, it costs about 9s for a sample.

Long Chen · Answer 1 · Thu Apr 18 2024 06:17:07 GMT+0800 (China Standard Time)

Hi Alkaiddd,
Thank you for your feedback! This model is not designed for real-time applications, and running inference with a 7B model does pose challenges, especially on less powerful GPUs. We have achieved inference times of 1-2 seconds per sample with batch inference on NVIDIA A100 GPUs. There's definitely room for improvement, such as quantization if you care about the inference speed.