THUDM / GLM

GLM (General Language Model)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

run ds_pretrain_nvidia.sh

lulia0228 opened this issue · comments

File "pretrain_glm.py", line 500, in initialize_distributed

File "pretrain_glm.py", line 470, in set_deepspeed_activation_checkpointing

File "/usr/local/conda/envs/llm_fine_tune/lib/python3.8/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 804, in _configure_using_config_file
if dist.get_rank() == 0:

File "/usr/local/conda/envs/llm_fine_tune/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 575, in get_rank
assert cdb is not None and cdb.is_initialized(), 'DeepSpeed backend not set, please initialize it using init_process_group()'

This is because the code doesn't support the latest version of DeepSpeed. You can install DeepSpeed <= 0.6.2. Or replace the torch.distributed.init_process_group in initialize_distributed with deepspeed.init_distributed.

This is because the code doesn't support the latest version of DeepSpeed. You can install DeepSpeed <= 0.5.9. Or replace the torch.distributed.init_process_group in initialize_distributed with deepspeed.init_distributed.

Thanks for your reply!