2020.08.08更新代码出现 RuntimeError: expected backend CUDA and dtype Float but got backend CUDA and dtype Long
zhaoxf4 opened this issue · comments
zhaoxf4 commented
如题,2020.08.08更新,按照和更新之前一样的处理方式,训练过程出现RuntimeError: expected backend CUDA and dtype Float but got backend CUDA and dtype Long。
日志如下:
(mrcner) zhaoxf4@nyu:/data1/zhaoxf4/mrc-for-flat-nested-ner$ sh script/train_zh_msra.sh
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
Please notice that merge the args_dict and json_config ... ...
{
"bert_frozen": "false",
"hidden_size": 768,
"hidden_dropout_prob": 0.2,
"classifier_sign": "multi_nonlinear",
"clip_grad": 1,
"bert_config": {
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21128
},
"config_path": "/data1/zhaoxf4/mrc-for-flat-nested-ner/config/zh_bert.json",
"data_dir": "/data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra",
"bert_model": "/data1/zhaoxf4/pretrained/chinese_L-12_H-768_A-12",
"task_name": null,
"max_seq_length": 100,
"train_batch_size": 8,
"dev_batch_size": 16,
"test_batch_size": 16,
"checkpoint": 10,
"learning_rate": 1e-05,
"num_train_epochs": 15,
"warmup_proportion": -1.0,
"max_grad_norm": 1.0,
"gradient_accumulation_steps": 1,
"seed": 2333,
"output_dir": "/data1/zhaoxf4/mrc-for-flat-nested-ner/model_save/zh_msra_100_1e-5_8_0.3_15_10",
"data_sign": "zh_msra",
"weight_start": 1.0,
"weight_end": 1.0,
"weight_span": 1.0,
"entity_sign": "flat",
"n_gpu": 1,
"dropout": 0.3,
"entity_threshold": 0.5,
"num_data_processor": 10,
"data_cache": true,
"export_model": true,
"do_lower_case": false,
"fp16": false,
"amp_level": "O2",
"local_rank": -1
}
-*--*--*--*--*--*--*--*--*--*-
current data_sign: zh_msra
=*==*==*==*==*==*==*==*==*==*=
loading train data ... ...
125184
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-0
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-1
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-2
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-3
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-4
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-5
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-6
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-7
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-8
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-9
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-8 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-0 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-5 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-9 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-6 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-4 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-7 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-3 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-2 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.train.cache.100.10-1 <<< <<< <<<
check number of examples before and after data processing :
125184 125184
125184 train data loaded
=*==*==*==*==*==*==*==*==*==*=
loading dev data ... ...
13908
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-0
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-1
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-2
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-3
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-4
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-5
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-6
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-7
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-8
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-9
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-8 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-1 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-5 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-3 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-9 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-4 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-2 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-0 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-6 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.dev.cache.100.10-7 <<< <<< <<<
check number of examples before and after data processing :
13908 13908
13908 dev data loaded
=*==*==*==*==*==*==*==*==*==*=
loading test data ... ...
13095
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-0
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-1
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-2
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-3
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-4
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-5
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-6
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-7
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-8
>>> >>> >>> export sliced features to : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-9
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-6 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-4 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-7 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-5 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-3 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-9 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-2 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-8 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-1 <<< <<< <<<
load sliced features from : /data1/zhaoxf4/mrc-for-flat-nested-ner/data/zh_msra/mrc-ner.test.cache.100.10-0 <<< <<< <<<
check number of examples before and after data processing :
13095 13095
13095 test data loaded
######################################################################
EPOCH: 0
Traceback (most recent call last):
File "/data1/zhaoxf4/mrc-for-flat-nested-ner/run/train_bert_mrc.py", line 339, in <module>
main()
File "/data1/zhaoxf4/mrc-for-flat-nested-ner/run/train_bert_mrc.py", line 335, in main
train(model, optimizer, sheduler, train_loader, dev_loader, test_loader, config, device, n_gpu, label_list)
File "/data1/zhaoxf4/mrc-for-flat-nested-ner/run/train_bert_mrc.py", line 185, in train
start_positions=start_pos, end_positions=end_pos, span_positions=span_pos, span_label_mask=span_label_mask)
File "/data0/zhaoxf4/.conda/envs/mrcner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/data1/zhaoxf4/mrc-for-flat-nested-ner/model/bert_mrc.py", line 72, in forward
start_loss = torch.sum(start_loss * token_type_ids.view(-1))
RuntimeError: expected backend CUDA and dtype Float but got backend CUDA and dtype Long
Deleted user commented
您好,感谢提问
我们在本地重新clone master repo (532f804),并且按照您的参数配置 重新跑MSRA的实验,是可以训练模型的,并没有复现您的问题。
我们的实验环境为:CUDA Version 10.1,PyTorch=1.6.0, Python 3.6.9
请问您的Pytorch版本号,CUDA版本和Python版本分别是多少呢?
非常感谢!
zhaoxf4 commented
用的之前的requirements里的配置:cuda=10.0.130 torch==1.1.0 python 3.6.10
zhaoxf4 commented
您好,我重新创建了虚拟环境,升级到了Pytorch=1.6.0,现在已经可以正常运行了,看起来应该是Pyorch版本的问题。谢谢!