Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

多gpu从0训练出现CUDA error: device-side assert triggered

zhouwei5113 opened this issue · comments

错误信息:
File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2846, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered

训练脚本:
python train.py --folder data_dir --model_name ViT-B/32 --batch_size 1024 --gpus 4 --strategy ddp --num_workers 16

如何解决?(单gpu训练没有问题)