THUDM / GLM

GLM (General Language Model)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

在测试lambada时报错,显示参数错误,分布式错误

haiqizhang opened this issue · comments

命令:
bash scripts/evaluate_lm.sh \
config_tasks/model_blocklm_large_generation.sh
config_tasks/zero_lambada.sh
报错信息:
finetune_glm.py: error: argument --experiment-name: expected one argument
/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

FutureWarning,
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 86020) of binary: /home/letrain/miniconda/envs/glm/bin/python
Traceback (most recent call last):
File "/home/letrain/miniconda/envs/glm/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/letrain/miniconda/envs/glm/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/letrain/miniconda/envs/glm/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: