Kernel restarted
cahya-wirawan opened this issue · comments
I have some other problems to run the notebook CLS-DE.ipynb. If I use conda and install the default pytorch (1.3.1), after the command
exp.finetune_lm.train_(cls_dataset, num_epochs=20)
I get following error message:
ImportError: /tmp/torch_extensions/forget_mult_cuda/forget_mult_cuda.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKSs
Then I installed pytroch from the pytorch channel as follow:
conda install pytorch=1.3.1 torchvision cudatoolkit=10.0 -c pytorch
The issue with "undefined symbol" is gone, but the kernel was restarted during the first epoch of exp.finetune_lm.train_(cls_dataset, num_epochs=20)
Is this known problem? Following is maybe the relevan python modules:
$ conda list| egrep 'torch|^fastai|cuda|nvid'
_pytorch_select 0.2 gpu_0
cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
fastai 1.0.61 1 fastai
nvidia-ml-py3 7.352.0 py_0 fastai
pytorch 1.3.1 cuda100py37h53c1284_0
torchvision 0.4.2 cuda100py37hecfc37a_0
Thanks.
I fixed the kernel restarting after I use CUDA 9.2 instead of CUDA 10.0. It seems the model doesn't like the latest cuda version. Now the notebook runs properly to the end.