VDCNN is failing with binary classification

Question

VDCNN is failing with binary classification

spandanagella opened this issue 5 years ago · comments

Hi,

I trained VDNN models on AG news and few other datasets that I have and it worked as expected. However, when running this on binary classification datasets (including yelp polarity) model fails with below error. I tested this with multiple binary classification datasets of different sizes.

THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu line=120 error=59 : device-side assert triggered

Any idea why this is happening? Would really appreciate if anyone can give some pointers on why this is happening!

Thanks,
Spandana

Complete error:

File "/code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/main.py", line 356, in
/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [78,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.
train_acc = train(epoch,net, tr_loader, device, msg="training", optimize=True, optimizer=optimizer, scheduler=scheduler, criterion=criterion)
File "code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/main.py", line 192, in train
out = net(data[0])
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "code/vdcnn/vdcnn_working_nlp_benchmarks/src/vdcnn/net.py", line 113, in forward
out = self.layers(out)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 77, in forward
self.return_indices)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/_jit_internal.py", line 132, in fn
return if_false(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 394, in _max_pool1d
input, kernel_size, stride, padding, dilation, ceil_mode)[0]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 386, in max_pool1d_with_indices
input, kernel_size, _stride, padding, dilation, ceil_mode)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialDilatedMaxPooling.cu:120

Spandana Gella · Answer 1 · Thu Apr 18 2019 01:24:42 GMT+0800 (China Standard Time)

pytorch version that I'm using is 1.0.1.

Ardalan · Answer 2 · Fri Apr 19 2019 23:28:52 GMT+0800 (China Standard Time)

Hi @spandanagella,
Thanks for pointing that out !

Indeed, there was a problem with the embedding layer (size of the dictionary of embeddings to be more precise). Fixed it with this commit: 393064d

master should work now (tested on yelp_polarity dataset)
Let me know if it does not.

Cheers,
Ardalan

Spandana Gella · Answer 3 · Sat Apr 20 2019 00:28:34 GMT+0800 (China Standard Time)

Thanks Ardalan for quick response. This fixed the issue :)