OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] <title>多卡部署OmniLMM12B给出所有的数据必须在相同的device上报错

SKY072410 opened this issue · comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

在两张16G的3080显卡上按照官方给出的多卡部署MiniCPM-Llama3-V能够成功部署并进行推理,但是多卡部署OmniLmm12B时虽然像给出的指示设置了
device_map["model.embed_tokens"] = 0
device_map["model.layers.0"] = 0
device_map["model.layers.31"] = 0
device_map["model.norm"] = 0
device_map["model.resampler"] = 0
device_map["model.vision_tower"] = 0
device_map["lm_head"] = 0
保证输入输出在同一张显卡,但仍然给出数据不在一张显卡的报错。Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

期望行为 | Expected Behavior

请问怎么解决这个问题?

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

chat.py#L30-L31 if False 改成if True 即可支持OmniLMM12B多卡推理

直接将False改成True任然报错,好像还是不在同一个设备上。RuntimeError: Tensor on device cuda:0 is not on the expected device meta!