OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

请问跑minicpm-llama3-v-2_5(int4)支持并发调用接口么?2个及以上并发调用就报错了

geminizyz opened this issue · comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

需要怎么样才能并发呢?目前是一台物理机 24G 显卡,虽然资源不多,但希望能够实现起码2个并发吧~~
微信截图_20240619180359
微信截图_20240619180246

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

请问你知道minicpm-llama3-v-2_5(int4)是使用哪种方式量化得到的吗

请问你知道minicpm-llama3-v-2_5(int4)是使用哪种方式量化得到的吗

BnB

请问你知道minicpm-llama3-v-2_5(int4)是使用哪种方式量化得到的吗

BnB

请问是指 BitsAndBytes 吗

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

需要怎么样才能并发呢?目前是一台物理机 24G 显卡,虽然资源不多,但希望能够实现起码2个并发吧~~ 微信截图_20240619180359 微信截图_20240619180246

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:3.10
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

遇到同样的问题