运行 zh_msra.sh出现 No rendezvous handler for env://
gouhaogou opened this issue · comments
环境
PyTorch Version : 1.7.1
OS : Windows 10
Python version: 3.6.12
CUDA/cuDNN version: 10.2
GPU models and configuration: 1050
以下
$ sh zh_msra.sh
Some weights of the model checkpoint at D:/Git/mnt/mrc/chinese_roberta_wwm_large_ext_pytorch were not used when initializing BertQueryNER: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertQueryNER from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertQueryNER from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertQueryNER were not initialized from the model checkpoint at D:/Git/mnt/mrc/chinese_roberta_wwm_large_ext_pytorch and are newly initialized: ['start_outputs.weight', 'start_outputs.bias', 'end_outputs.weight', 'end_outputs.bias', 'span_embedding.classifier1.weight', 'span_embedding.classifier1.bias', 'span_embedding.classifier2.weight', 'span_embedding.classifier2.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
GPU available: True, used: False
INFO:lightning:GPU available: True, used: False
TPU available: False, using: 0 TPU cores
INFO:lightning:TPU available: False, using: 0 TPU cores
D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\utilities\distributed.py:37: UserWarning: GPU available but not used. Set the --gpus flag when calling the script.
warnings.warn(*args, **kwargs)
Using native 16bit precision.
INFO:lightning:Using native 16bit precision.
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
INFO:lightning:initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
Traceback (most recent call last):
File "trainer.py", line 385, in
main()
File "trainer.py", line 380, in main
trainer.fit(model)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1058, in fit
results = self.accelerator_backend.spawn_ddp_children(model)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\accelerators\ddp_backend.py", line 123, in spawn_ddp_children
results = self.ddp_train(local_rank, mp_queue=None, model=model, is_master=True)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\accelerators\ddp_backend.py", line 164, in ddp_train
self.trainer.is_slurm_managing_tasks
File "D:\Miniconda3\envs\pytorch\lib\site-packages\pytorch_lightning\core\lightning.py", line 908, in init_ddp_connection
torch_distrib.init_process_group(torch_backend, rank=global_rank, world_size=world_size)
File "D:\Miniconda3\envs\pytorch\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group
init_method, rank, world_size, timeout=timeout
File "D:\Miniconda3\envs\pytorch\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous
raise RuntimeError("No rendezvous handler for {}://".format(result.scheme))
RuntimeError: No rendezvous handler for env://
请问是什么问题呢?
请问您解决了么,我也遇到了相同的问题,应该是GPU的问题,我的设备只有一块GPU但是代码是4块并行运算的,但我不知到该怎么修改,请问您解决了吗?谢谢!
@gouhaogou @Dylan-SUFE 将 --gpus
参数进行修改即可。具体可参考pytorch-lightning的documentation
@YuxianMeng 谢谢,已解决
@YuxianMeng 谢谢,已解决
您好请问您是怎么解决的?修改哪里的gpu参数,能详细说说么?谢谢您了!
@YuxianMeng 谢谢,已解决
您好请问您是怎么解决的?修改哪里的gpu参数,能详细说说么?谢谢
你好,你也在做这个吗,可否留个QQ或其他联系方式,我们交流一下
@YuxianMeng 谢谢,已解决
您好请问您是怎么解决的?修改哪里的gpu参数,能详细说说么?谢谢
你好,你也在做这个吗,可否留个QQ或其他联系方式,我们交流一下
可以呀,qq:1627128918,最近在复现这篇论文。