The inference generation is very slow
Alkaiddd opened this issue · comments
The inference process is currently quite slow. Are there any methods available to accelerate it?
For action task, it costs about 9s for a sample.
Hi Alkaiddd,
Thank you for your feedback! This model is not designed for real-time applications, and running inference with a 7B model does pose challenges, especially on less powerful GPUs. We have achieved inference times of 1-2 seconds per sample with batch inference on NVIDIA A100 GPUs. There's definitely room for improvement, such as quantization if you care about the inference speed.