[BUG] <title>多卡部署OmniLMM12B给出所有的数据必须在相同的device上报错

Question

[BUG] <title>多卡部署OmniLMM12B给出所有的数据必须在相同的device上报错

SKY072410 opened this issue a month ago · comments

SKY072410 commented a month ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

在两张16G的3080显卡上按照官方给出的多卡部署MiniCPM-Llama3-V能够成功部署并进行推理，但是多卡部署OmniLmm12B时虽然像给出的指示设置了
device_map["model.embed_tokens"] = 0
device_map["model.layers.0"] = 0
device_map["model.layers.31"] = 0
device_map["model.norm"] = 0
device_map["model.resampler"] = 0
device_map["model.vision_tower"] = 0
device_map["lm_head"] = 0
保证输入输出在同一张显卡，但仍然给出数据不在一张显卡的报错。Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

期望行为 | Expected Behavior

请问怎么解决这个问题？

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Hongji Zhu · Answer 1 · Sun Jun 09 2024 14:52:49 GMT+0800 (China Standard Time)

将chat.py#L30-L31 if False 改成if True 即可支持OmniLMM12B多卡推理

SKY072410 · Answer 2 · Thu Jun 13 2024 09:11:30 GMT+0800 (China Standard Time)

直接将False改成True任然报错，好像还是不在同一个设备上。RuntimeError: Tensor on device cuda:0 is not on the expected device meta!