01-ai / Yi

A series of large language models trained from scratch by developers @01-ai

Home Page:https://01.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When the API is called multiple times, the GPU memory continuously increases until it overflows.

jiangerxiaozhao opened this issue · comments

Reminder

  • I have searched the Github Discussion and issues and have not found anything similar to this.

Environment

- OS: Ubuntu 20.04.5 LTS
- Python:3.10
- PyTorch:2.2.2+cu121
- CUDA:12.1

Current Behavior

When the API is called multiple times, the GPU memory continuously increases until it overflows.
https://github.com/01-ai/Yi/tree/main/VL#api
Model: Yi-VL-6B

Expected Behavior

No response

Steps to Reproduce

Yi-VL-6B

Anything Else?

No response

Thank you for pointing this out, I will fix it. In addition, LMDeploy supports the deployment of our VL model. You can try it.