error in running train.py

Question

error in running train.py

cshanjiewu opened this issue 4 years ago · comments

when I run 'python train.py' with default settings and default feature dataset that provided in this Github, there is an error.
Traceback (most recent call last): File "****/avsd/train.py", line 95, in <module> dataset = VisDialDataset(args, ['train']) File "******/avsd/dataloader.py", line 157, in __init__ self._process_history(dtype) File "********/avsd/dataloader.py", line 296, in _process_history = captions[th_id][:max_ques_len + max_ans_len] RuntimeError: The expanded size of the tensor (44) must match the existing size (40) at non-singleton dimension 0. Target sizes: [44]. Tensor sizes: [40]

hank · Answer 1 · Thu Sep 10 2020 16:56:10 GMT+0800 (China Standard Time)

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.c:36 /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [7,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSizefailed. /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [7,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSizefailed.

Tom Winterbottom · Answer 2 · Mon Sep 14 2020 01:00:45 GMT+0800 (China Standard Time)

Hey, not the author of this but i had the same problem. Until we get an update, for now, just restricting the created history tensor with min can work around this:

history[th_id][round_id][:min(40, max_ques_len + max_ans_len)] = captions....

Karan Sheth · Answer 3 · Fri Oct 02 2020 19:10:22 GMT+0800 (China Standard Time)

Traceback (most recent call last):
File "/content/drive/My Drive/Colab Notebooks/train.py", line 167, in
enc_out = encoder(batch)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/Colab Notebooks/encoders/lf.py", line 92, in forward
hist_embed = self.hist_rnn(hist_embed, batch['hist_len'])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/Colab Notebooks/utils/dynamic_rnn.py", line 34, in forward
sorted_seq_input, lengths=sorted_len, batch_first=True)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/utils/rnn.py", line 234, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered
ERROR WHILE RUNNING train.py

Yuxuan Wang · Answer 4 · Fri Jul 16 2021 14:49:51 GMT+0800 (China Standard Time)

same question, did you solve it?