请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么？2个及以上并发调用就报错了

Question

请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么？2个及以上并发调用就报错了

geminizyz opened this issue 2 days ago · comments

geminizyz commented 2 days ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

需要怎么样才能并发呢？目前是一台物理机 24G 显卡，虽然资源不多，但希望能够实现起码2个并发吧~~

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

1SingleFeng · Answer 1 · Thu Jun 20 2024 11:15:26 GMT+0800 (China Standard Time)

请问你知道minicpm-llama3-v-2_5(int4)是使用哪种方式量化得到的吗

weiminw · Answer 2 · Thu Jun 20 2024 17:13:53 GMT+0800 (China Standard Time)

请问你知道minicpm-llama3-v-2_5(int4)是使用哪种方式量化得到的吗

BnB

1SingleFeng · Answer 3 · Thu Jun 20 2024 17:17:38 GMT+0800 (China Standard Time)

请问你知道minicpm-llama3-v-2_5(int4)是使用哪种方式量化得到的吗

BnB

请问是指 BitsAndBytes 吗

weiminw · Answer 4 · Thu Jun 20 2024 17:43:33 GMT+0800 (China Standard Time)

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

需要怎么样才能并发呢？目前是一台物理机 24G 显卡，虽然资源不多，但希望能够实现起码2个并发吧~~

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment
- OS:
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1
备注 | Anything else?

No response

遇到同样的问题