运行 zh_msra.sh出现 No rendezvous handler for env://

Question

运行 zh_msra.sh出现 No rendezvous handler for env://

gouhaogou opened this issue 4 years ago · comments

环境
PyTorch Version : 1.7.1
OS : Windows 10
Python version: 3.6.12
CUDA/cuDNN version: 10.2
GPU models and configuration: 1050

以下
$ sh zh_msra.sh
Some weights of the model checkpoint at D:/Git/mnt/mrc/chinese_roberta_wwm_large_ext_pytorch were not used when initializing BertQueryNER: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']

This IS expected if you are initializing BertQueryNER from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertQueryNER from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertQueryNER were not initialized from the model checkpoint at D:/Git/mnt/mrc/chinese_roberta_wwm_large_ext_pytorch and are newly initialized: ['start_outputs.weight', 'start_outputs.bias', 'end_outputs.weight', 'end_outputs.bias', 'span_embedding.classifier1.weight', 'span_embedding.classifier1.bias', 'span_embedding.classifier2.weight', 'span_embedding.classifier2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
GPU available: True, used: False
INFO:lightning:GPU available: True, used: False
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\utilities\distributed.py:37: UserWarning: GPU available but not used. Set the --gpus flag when calling the script.
warnings.warn(*args, **kwargs)
Using native 16bit precision.
INFO:lightning:Using native 16bit precision.
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
INFO:lightning:initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
File "trainer.py", line 385, in
main()
File "trainer.py", line 380, in main
trainer.fit(model)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1058, in fit
results = self.accelerator_backend.spawn_ddp_children(model)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\accelerators\ddp_backend.py", line 123, in spawn_ddp_children
results = self.ddp_train(local_rank, mp_queue=None, model=model, is_master=True)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\accelerators\ddp_backend.py", line 164, in ddp_train
self.trainer.is_slurm_managing_tasks
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\core\lightning.py", line 908, in init_ddp_connection
torch_distrib.init_process_group(torch_backend, rank=global_rank, world_size=world_size)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
init_method, rank, world_size, timeout=timeout
File "D:\Miniconda3\envs\pytorch\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://
请问是什么问题呢？

Dylan.X0513 · Answer 1 · Sun Jan 10 2021 21:32:19 GMT+0800 (China Standard Time)

请问您解决了么，我也遇到了相同的问题，应该是GPU的问题，我的设备只有一块GPU但是代码是4块并行运算的，但我不知到该怎么修改，请问您解决了吗？谢谢!

Yuxian Meng · Answer 2 · Mon Jan 11 2021 10:01:49 GMT+0800 (China Standard Time)

@gouhaogou @Dylan-SUFE 将 --gpus 参数进行修改即可。具体可参考pytorch-lightning的documentation

gouhaogou · Answer 3 · Mon Jan 11 2021 14:38:24 GMT+0800 (China Standard Time)

@YuxianMeng 谢谢，已解决

Dylan.X0513 · Answer 4 · Wed Jan 13 2021 21:12:44 GMT+0800 (China Standard Time)

@YuxianMeng 谢谢，已解决

您好请问您是怎么解决的？修改哪里的gpu参数，能详细说说么？谢谢您了！

gouhaogou · Answer 5 · Thu Jan 14 2021 14:00:34 GMT+0800 (China Standard Time)

@YuxianMeng 谢谢，已解决

您好请问您是怎么解决的？修改哪里的gpu参数，能详细说说么？谢谢

你好，你也在做这个吗，可否留个QQ或其他联系方式，我们交流一下

Dylan.X0513 · Answer 6 · Thu Jan 14 2021 14:29:47 GMT+0800 (China Standard Time)

@YuxianMeng 谢谢，已解决

您好请问您是怎么解决的？修改哪里的gpu参数，能详细说说么？谢谢

你好，你也在做这个吗，可否留个QQ或其他联系方式，我们交流一下

可以呀，qq:1627128918，最近在复现这篇论文。