When the API is called multiple times, the GPU memory continuously increases until it overflows.

Question

jiangerxiaozhao opened this issue 6 months ago · comments

I have searched the Github Discussion and issues and have not found anything similar to this.

- OS: Ubuntu 20.04.5 LTS
- Python:3.10
- PyTorch:2.2.2+cu121
- CUDA:12.1

When the API is called multiple times, the GPU memory continuously increases until it overflows.
https://github.com/01-ai/Yi/tree/main/VL#api
Model: Yi-VL-6B

No response

Yi-VL-6B

No response

Guofeng Yi · Answer 1 · Sun Apr 14 2024 15:28:00 GMT+0800 (China Standard Time)

Thank you for pointing this out, I will fix it. In addition, LMDeploy supports the deployment of our VL model. You can try it.