run ds_pretrain_nvidia.sh

Question

run ds_pretrain_nvidia.sh

lulia0228 opened this issue a year ago · comments

File "pretrain_glm.py", line 500, in initialize_distributed

File "pretrain_glm.py", line 470, in set_deepspeed_activation_checkpointing

File "/usr/local/conda/envs/llm_fine_tune/lib/python3.8/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 804, in _configure_using_config_file
if dist.get_rank() == 0:

File "/usr/local/conda/envs/llm_fine_tune/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 575, in get_rank
assert cdb is not None and cdb.is_initialized(), 'DeepSpeed backend not set, please initialize it using init_process_group()'

Zhengxiao Du · Answer 1 · Sat Mar 04 2023 19:07:12 GMT+0800 (China Standard Time)

This is because the code doesn't support the latest version of DeepSpeed. You can install DeepSpeed <= 0.6.2. Or replace the torch.distributed.init_process_group in initialize_distributed with deepspeed.init_distributed.

lulia0228 · Answer 2 · Sat Mar 04 2023 19:10:47 GMT+0800 (China Standard Time)

This is because the code doesn't support the latest version of DeepSpeed. You can install DeepSpeed <= 0.5.9. Or replace the torch.distributed.init_process_group in initialize_distributed with deepspeed.init_distributed.

Thanks for your reply!