kevinduh / san_mrc

Stochastic Answer Networks (SAN) for Machine Reading Comprehension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SQuAD 2.0 cuda runtime error

Theerit opened this issue · comments

Hi all,
I am pretty new to field of pytorch and deep learning. I am interested in your implementation and tried running your code for both SQuAD 2 and 1.1. I was success at running the first version but failed to run the second version and encountered problem as per below.

File "train.py", line 168, in
main()
File "train.py", line 111, in main
results, labels = predict_squad(model, dev_data, v2_on=args.v2_on)
File "/home/san_mrc/my_utils/data_utils.py", line 34, in predict_squad
phrase, spans, scores = model.predict(batch)
File "/home/san_mrc/src/model.py", line 112, in predict
start, end, lab = self.network(batch)
File "/home/anaconda2/envs/SAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/san_mrc/src/dreader.py", line 88, in forward
doc_elmo, query_elmo = self.lexicon_encoder(batch)
File "/home/anaconda2/envs/SAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/san_mrc/src/encoder.py", line 146, in forward
doc_cove_low, doc_cove_high = self.ContextualEmbed(doc_tok, doc_mask)
File "/home/anaconda2/envs/SAN/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/san_mrc/src/recurrent.py", line 140, in forward
output1, _ = self.rnn1(pack(x_hiddens[indices], lens.tolist(), batch_first=True))
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535493744281/work/aten/src/THC/generated/../THCReduceAll.cuh:317

  • The config that I changed is fix_embeddings and number of epoches
  • The error appeared after training the first epoch, before getting to dump/save the model file.