Train WaveRnn AttributeError
wierzs opened this issue · comments
Hello,
Upon training tacotron for about 250k steps I decided to force gta and start WaveRnn training.
I initialized train_wavernn.py --gta, cuda was detected and the parameters was loaded. After about 10 seconds I got this AttributeError:
Runnung in PowerShell
PS D:\BetterTTS\WaveRNN> python train_wavernn.py --gta
Using device: cuda
Initialising Model...
Trainable Parameters: 4.234M
Restoring from latest checkpoint...
Loading latest weights: D:\BetterTTS\WaveRNN\checkpoints\ljspeech_mol.wavernn\latest_weights.pyt
Loading latest optimizer state: D:\BetterTTS\WaveRNN\checkpoints\ljspeech_mol.wavernn\latest_optim.pyt
+-------------+------------+--------+--------------+-----------+
| Remaining | Batch Size | LR | Sequence Len | GTA Train |
+-------------+------------+--------+--------------+-----------+
| 1000k Steps | 32 | 0.0001 | 1375 | True |
+-------------+------------+--------+--------------+-----------+
Traceback (most recent call last):
File "train_wavernn.py", line 159, in <module>
main()
File "train_wavernn.py", line 85, in main
voc_train_loop(paths, voc_model, loss_func, optimizer, train_set, test_set, lr, total_steps)
File "train_wavernn.py", line 105, in voc_train_loop
for i, (x, y, m) in enumerate(train_set, 1):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
data = self._next_data()
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
return self._process_data(data)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
data.reraise()
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\_utils.py", line 395, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
return self.collate_fn(data)
File "D:\BetterTTS\WaveRNN\utils\dataset.py", line 68, in collate_vocoder
mel_win = hp.voc_seq_len // hp.hop_length + 2 * hp.voc_pad
File "D:\BetterTTS\WaveRNN\utils\__init__.py", line 53, in __getattr__
raise AttributeError("HParams not configured yet. Call self.configure()")
AttributeError: HParams not configured yet. Call self.configure()
So I take a look in utils/init.py and find this on line 42++:
def __init__(self, path: Union[str, Path]=None):
"""Constructs the hyperparameters from a path to a python module. If
`path` is None, will raise an AttributeError whenever its attributes
are accessed. Otherwise, configures self based on `path`."""
if path is None:
self._configured = False
else:
self.configure(path)
def __getattr__(self, item):
if not self.is_configured():
raise AttributeError("HParams not configured yet. Call self.configure()")
else:
return super().__getattr__(item)
I'm an amateur in Python, but isn't this basically saying: Path = none, if Path == none, exit()? What am I missing here?
Thank you for any answers you may have!
Running on Cuda 10.1
GPU: RTX2070
Also, PIP list:
Package Version
---------------------- -----------
absl-py 0.12.0
astor 0.8.1
audioread 2.1.9
bleach 1.5.0
cached-property 1.5.2
cachetools 4.2.1
certifi 2020.12.5
chardet 4.0.0
click 7.1.2
cycler 0.10.0
dataclasses 0.8
decorator 5.0.6
falcon 1.2.0
ffmpeg 1.4
future 0.18.2
gast 0.2.2
google-auth 1.29.0
google-auth-oauthlib 0.4.4
google-pasta 0.2.0
grpcio 1.37.0
h5py 3.1.0
html5lib 0.9999999
idna 2.10
importlib-metadata 3.10.0
inflect 0.2.5
joblib 1.0.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
kiwisolver 1.3.1
librosa 0.6.3
llvmlite 0.31.0
Markdown 3.3.4
matplotlib 3.1.0
mock 4.0.3
nltk 3.6.2
numba 0.48.0
numpy 1.16.2
oauthlib 3.1.0
opt-einsum 3.3.0
pathlib 1.0.1
Pillow 8.2.0
pip 21.0.1
protobuf 3.15.8
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydub 0.25.1
pyparsing 2.4.7
python-dateutil 2.8.1
python-mimeparse 1.6.0
pytz 2021.1
regex 2021.4.4
requests 2.25.1
requests-oauthlib 1.3.0
resampy 0.2.2
rsa 4.7.2
scikit-learn 0.24.1
scipy 1.0.0
setuptools 56.0.0
six 1.15.0
SpeechRecognition 3.8.1
srt 3.4.1
tensorboard 1.12.0
tensorflow-estimator 1.13.0
tensorflow-gpu 1.3.0
tensorflow-tensorboard 0.1.8
termcolor 1.1.0
threadpoolctl 2.1.0
torch 1.5.1+cu101
torchvision 0.6.1+cu101
tqdm 4.11.2
typing-extensions 3.7.4.3
Unidecode 0.4.20
urllib3 1.26.4
Werkzeug 1.0.1
wheel 0.36.2
wrapt 1.12.1
zipp 3.4.1
I'm not sure if we have the same issue, but try going to the "utils" folder, open the "dataset.py" file, and in line 54 change "num_workers=2," to "num_workers=0,".
tell me if that works
Hello,
Thank you so much for giving input! I tried this and now I have another error. At least this is progress.
Now it's saying:
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
I guess I just have to find out where to add .cpu() now. Thank you!
np man glad i could help, i never had that other issue you are now having so cant help with that
Ok, I got it! In train_wavernn.py, line 129:
Change if np.isnan(grad_norm):
to if np.isnan(grad_norm.cpu()):
It's training now. Thank you @Ahmad21A for the help!
oh yeaaa the isnan thing, i just deleted that line lol, idk anything about python but i figured out how to run this repo by sheer luck