When the API is called multiple times, the GPU memory continuously increases until it overflows.
jiangerxiaozhao opened this issue · comments
jiangerxiaozhao commented
Reminder
- I have searched the Github Discussion and issues and have not found anything similar to this.
Environment
- OS: Ubuntu 20.04.5 LTS
- Python:3.10
- PyTorch:2.2.2+cu121
- CUDA:12.1
Current Behavior
When the API is called multiple times, the GPU memory continuously increases until it overflows.
https://github.com/01-ai/Yi/tree/main/VL#api
Model: Yi-VL-6B
Expected Behavior
No response
Steps to Reproduce
Yi-VL-6B
Anything Else?
No response
Guofeng Yi commented
Thank you for pointing this out, I will fix it. In addition, LMDeploy supports the deployment of our VL model. You can try it.