Train WaveRnn AttributeError

Question

Train WaveRnn AttributeError

wierzs opened this issue 3 years ago · comments

Hello,
Upon training tacotron for about 250k steps I decided to force gta and start WaveRnn training.
I initialized train_wavernn.py --gta, cuda was detected and the parameters was loaded. After about 10 seconds I got this AttributeError:

Runnung in PowerShell

PS D:\BetterTTS\WaveRNN> python train_wavernn.py --gta
Using device: cuda

Initialising Model...

Trainable Parameters: 4.234M
Restoring from latest checkpoint...
Loading latest weights: D:\BetterTTS\WaveRNN\checkpoints\ljspeech_mol.wavernn\latest_weights.pyt
Loading latest optimizer state: D:\BetterTTS\WaveRNN\checkpoints\ljspeech_mol.wavernn\latest_optim.pyt
+-------------+------------+--------+--------------+-----------+
|  Remaining  | Batch Size |   LR   | Sequence Len | GTA Train |
+-------------+------------+--------+--------------+-----------+
| 1000k Steps |     32     | 0.0001 |     1375     |   True    |
+-------------+------------+--------+--------------+-----------+

Traceback (most recent call last):
  File "train_wavernn.py", line 159, in <module>
    main()
  File "train_wavernn.py", line 85, in main
    voc_train_loop(paths, voc_model, loss_func, optimizer, train_set, test_set, lr, total_steps)
  File "train_wavernn.py", line 105, in voc_train_loop
    for i, (x, y, m) in enumerate(train_set, 1):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
    data = self._next_data()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
    data.reraise()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\_utils.py", line 395, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\_utils\worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "D:\BetterTTS\WaveRNN\utils\dataset.py", line 68, in collate_vocoder
    mel_win = hp.voc_seq_len // hp.hop_length + 2 * hp.voc_pad
  File "D:\BetterTTS\WaveRNN\utils\__init__.py", line 53, in __getattr__
    raise AttributeError("HParams not configured yet. Call self.configure()")
AttributeError: HParams not configured yet. Call self.configure()

So I take a look in utils/init.py and find this on line 42++:

    def __init__(self, path: Union[str, Path]=None):
        """Constructs the hyperparameters from a path to a python module. If
        `path` is None, will raise an AttributeError whenever its attributes
        are accessed. Otherwise, configures self based on `path`."""
        if path is None:
            self._configured = False
        else:
            self.configure(path)

    def __getattr__(self, item):
        if not self.is_configured():
            raise AttributeError("HParams not configured yet. Call self.configure()")
        else:
            return super().__getattr__(item)

I'm an amateur in Python, but isn't this basically saying: Path = none, if Path == none, exit()? What am I missing here?
Thank you for any answers you may have!

Running on Cuda 10.1
GPU: RTX2070
Also, PIP list:

Package                Version
---------------------- -----------
absl-py                0.12.0
astor                  0.8.1
audioread              2.1.9
bleach                 1.5.0
cached-property        1.5.2
cachetools             4.2.1
certifi                2020.12.5
chardet                4.0.0
click                  7.1.2
cycler                 0.10.0
dataclasses            0.8
decorator              5.0.6
falcon                 1.2.0
ffmpeg                 1.4
future                 0.18.2
gast                   0.2.2
google-auth            1.29.0
google-auth-oauthlib   0.4.4
google-pasta           0.2.0
grpcio                 1.37.0
h5py                   3.1.0
html5lib               0.9999999
idna                   2.10
importlib-metadata     3.10.0
inflect                0.2.5
joblib                 1.0.1
Keras-Applications     1.0.8
Keras-Preprocessing    1.1.2
kiwisolver             1.3.1
librosa                0.6.3
llvmlite               0.31.0
Markdown               3.3.4
matplotlib             3.1.0
mock                   4.0.3
nltk                   3.6.2
numba                  0.48.0
numpy                  1.16.2
oauthlib               3.1.0
opt-einsum             3.3.0
pathlib                1.0.1
Pillow                 8.2.0
pip                    21.0.1
protobuf               3.15.8
pyasn1                 0.4.8
pyasn1-modules         0.2.8
pydub                  0.25.1
pyparsing              2.4.7
python-dateutil        2.8.1
python-mimeparse       1.6.0
pytz                   2021.1
regex                  2021.4.4
requests               2.25.1
requests-oauthlib      1.3.0
resampy                0.2.2
rsa                    4.7.2
scikit-learn           0.24.1
scipy                  1.0.0
setuptools             56.0.0
six                    1.15.0
SpeechRecognition      3.8.1
srt                    3.4.1
tensorboard            1.12.0
tensorflow-estimator   1.13.0
tensorflow-gpu         1.3.0
tensorflow-tensorboard 0.1.8
termcolor              1.1.0
threadpoolctl          2.1.0
torch                  1.5.1+cu101
torchvision            0.6.1+cu101
tqdm                   4.11.2
typing-extensions      3.7.4.3
Unidecode              0.4.20
urllib3                1.26.4
Werkzeug               1.0.1
wheel                  0.36.2
wrapt                  1.12.1
zipp                   3.4.1

AhmadAlAmin21 · Answer 1 · Wed Jun 02 2021 02:07:39 GMT+0800 (China Standard Time)

I'm not sure if we have the same issue, but try going to the "utils" folder, open the "dataset.py" file, and in line 54 change "num_workers=2," to "num_workers=0,".

tell me if that works

wierzs · Answer 2 · Wed Jun 02 2021 02:13:19 GMT+0800 (China Standard Time)

Hello,
Thank you so much for giving input! I tried this and now I have another error. At least this is progress.
Now it's saying:
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I guess I just have to find out where to add .cpu() now. Thank you!

AhmadAlAmin21 · Answer 3 · Wed Jun 02 2021 02:16:05 GMT+0800 (China Standard Time)

np man glad i could help, i never had that other issue you are now having so cant help with that

wierzs · Answer 4 · Wed Jun 02 2021 02:16:08 GMT+0800 (China Standard Time)

Ok, I got it! In train_wavernn.py, line 129:
Change if np.isnan(grad_norm):
to if np.isnan(grad_norm.cpu()):

It's training now. Thank you @Ahmad21A for the help!

AhmadAlAmin21 · Answer 5 · Wed Jun 02 2021 02:17:41 GMT+0800 (China Standard Time)

oh yeaaa the isnan thing, i just deleted that line lol, idk anything about python but i figured out how to run this repo by sheer luck