Training continue_data.py gives Runtimerror
janlycka opened this issue · comments
Greetings to all the developers of deep-synth,
I am experiencing some trouble with commit cc0e991, possibly due to me using newer libraries that it's designed for. I would like to know which
12.01.2021
OS: Ubuntu 18.04
CUDA: 10.1
Python: 3.8.5
pytorch: 1.7.1 (with 0.4 I cannot even run python create_data.py
)
scipy: 1.2.0
Having successfully generated the data with create_data.py, we now have several folders in the data directory: bedroom, living, office
Running location_train.py --data-dir bedroom --save-dir res_2_bedroom --train-size 500 --use-count
works
Running continue_train.py --data-dir bedroom --save-dir res_1_bedroom --train-size 500 --use-count
doesn't
I get the following Error:
RuntimeError: Expected object of scalar type Long but got scalar type Bool for argument #2 'target' in call to _thnn_nll_loss_forward
looking forward to hearing from you,
KR
probably a version issue - just update the binary target to Long
I have very limited knowledge of pytorch. Could you extrapolate?
Thanks & KR
deep-synth/deep-synth/continue_dataset.py
Line 67 in b800e11
Alright, I've solved it. (Edit: Also thanks for your reply, I was typing up the response at the time at which you answered)
TLDR:
in continue_train.py, change lines 137 and 179 from
data, target = data.cuda(), target.cuda()
to
data, target = data.cuda(), target.cuda().long()
continue_train.py now runs properly
python continue_train.py --data-dir bedroom --save-dir res_1_bedroom --train-size 500 --use-count
I hadn't understood your answer linguistically, as I had been unaware of the existence of tensors, which may have Boolean type, Double type or Long type. This is something different to the standard c-family datatypes bool, double, int, long etc. If I hear update bool to long, I understand casting
This Project uses a very old version of the pytorch library 0.4.1, which was impossible to use together with CUDA 10.1 which we have installed on our system. Since 2018, the time of release of deep-synth, pytorch developers have changed things about with pytorch and at present, pytorch 1.7.1 expects to receive tensors with the type Long as opposed to Bool, as it's previously been, which we now must cast like so:
target = target.cuda().long()
This helpful link and two pythonists whom I know saved the day
A similar issue for further reading can be found here
whilst on the note of the use of outdated libraries in this commit, which is understandable considering it's from 2018, whoever else may be trying to get this thing afloat in the future, make sure that your SciPy library is 1.2 or older, as there has as well been a breaking change in version 1.3
libraries links
Matplotlib
Numpy
Numba
SciKit-learn
SciPy
Versions of the libraries for commit cc0e991
Here are the versions of the libraries contained in requirements.txt, which I currently have installed on my machine and which most definitely work
replace requirements.txt with this (gets you the desired versions)
matplotlib==3.3.3
numpy==1.19.2
numba==0.52.0
pybullet==2.1.3
pyquaternion==0.9.9
scikit-learn==0.24.0
scipy==1.2.0
install by running
pip install -r requirements.txt
check the library versions
And finally, this serves to save you the 20 minutes of googling
install pyTorch like so
conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch
check pytorch version
python -c "import torch; print(torch.__version__)"
1.7.1
check matplotlib version
python -c "import matplotlib; print(matplotlib.__version__)"
3.3.3
check numpy version
"import numpy; print(numpy.version.version)"
1.19.2
check scipy version
python -c "import scipy; print(scipy.version.version)"
1.2.0