hpcaitech / PaLM-colossalai

Scalable PaLM implementation of PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

torch.distributed.elastic.multipro cessing.errors.ChildFailedError

cainiaogoroad opened this issue · comments

paml错误截图1
palm错误截图2
Above is the program operation log,its says torch.distributed.elastic.multipro cessing.errors.ChildFailedError.
Can anybody know why it happen.Thanks!