Distributed Backend is not initialized

Question

Distributed Backend is not initialized

Joris-Fu opened this issue 2 years ago · comments

Within code of branch "import_flow_as_torch", run scripts/ds_finetune_seq2seq.sh config_tasks/model_blocklm_large_chinese.sh config_tasks/seq_customization.sh, we encounter the AssertionError “Distributed Backend is not initialized Please set dist_init_required to True or initialize before calling deepspeed.initialize()”

But it seems that you delete the distributed initialize code in pretrain_glm.py on purpose, which is
torch.distributed.init_process_group( backend=args.distributed_backend, world_size=args.world_size, rank=args.rank, init_method=init_method)
how to solve this problem