[BUG] <title>多卡部署OmniLMM12B给出所有的数据必须在相同的device上报错
SKY072410 opened this issue · comments
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
在两张16G的3080显卡上按照官方给出的多卡部署MiniCPM-Llama3-V能够成功部署并进行推理,但是多卡部署OmniLmm12B时虽然像给出的指示设置了
device_map["model.embed_tokens"] = 0
device_map["model.layers.0"] = 0
device_map["model.layers.31"] = 0
device_map["model.norm"] = 0
device_map["model.resampler"] = 0
device_map["model.vision_tower"] = 0
device_map["lm_head"] = 0
保证输入输出在同一张显卡,但仍然给出数据不在一张显卡的报错。Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
期望行为 | Expected Behavior
请问怎么解决这个问题?
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
备注 | Anything else?
No response
将chat.py#L30-L31 if False 改成if True 即可支持OmniLMM12B多卡推理
直接将False改成True任然报错,好像还是不在同一个设备上。RuntimeError: Tensor on device cuda:0 is not on the expected device meta!