Train WaveRnn AttributeError

wierzs opened this issue · comments

Upon training tacotron for about 250k steps I decided to force gta and start WaveRnn training.
I initialized train_wavernn.py --gta, cuda was detected and the parameters was loaded. After about 10 seconds I got this AttributeError:

Runnung in PowerShell

PS D:\BetterTTS\WaveRNN> python train_wavernn.py --gta
Using device: cuda

Initialising Model...

Trainable Parameters: 4.234M
Restoring from latest checkpoint...
Loading latest weights: D:\BetterTTS\WaveRNN\checkpoints\ljspeech_mol.wavernn\latest_weights.pyt
Loading latest optimizer state: D:\BetterTTS\WaveRNN\checkpoints\ljspeech_mol.wavernn\latest_optim.pyt
|  Remaining  | Batch Size |   LR   | Sequence Len | GTA Train |
| 1000k Steps |     32     | 0.0001 |     1375     |   True    |

Traceback (most recent call last):
  File "train_wavernn.py", line 159, in <module>
  File "train_wavernn.py", line 85, in main
    voc_train_loop(paths, voc_model, loss_func, optimizer, train_set, test_set, lr, total_steps)
  File "train_wavernn.py", line 105, in voc_train_loop
    for i, (x, y, m) in enumerate(train_set, 1):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
    data = self._next_data()
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\_utils.py", line 395, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\_utils\worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "D:\BetterTTS\WaveRNN\utils\dataset.py", line 68, in collate_vocoder
    mel_win = hp.voc_seq_len // hp.hop_length + 2 * hp.voc_pad
  File "D:\BetterTTS\WaveRNN\utils\__init__.py", line 53, in __getattr__
    raise AttributeError("HParams not configured yet. Call self.configure()")
AttributeError: HParams not configured yet. Call self.configure()

So I take a look in utils/init.py and find this on line 42++:

    def __init__(self, path: Union[str, Path]=None):
        """Constructs the hyperparameters from a path to a python module. If
        `path` is None, will raise an AttributeError whenever its attributes
        are accessed. Otherwise, configures self based on `path`."""
        if path is None:
            self._configured = False

    def __getattr__(self, item):
        if not self.is_configured():
            raise AttributeError("HParams not configured yet. Call self.configure()")
            return super().__getattr__(item)

I'm an amateur in Python, but isn't this basically saying: Path = none, if Path == none, exit()? What am I missing here?
Thank you for any answers you may have!

Running on Cuda 10.1
GPU: RTX2070
Also, PIP list:

I'm not sure if we have the same issue, but try going to the "utils" folder, open the "dataset.py" file, and in line 54 change "num_workers=2," to "num_workers=0,".

tell me if that works

Thank you so much for giving input! I tried this and now I have another error. At least this is progress.
Now it's saying:
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I guess I just have to find out where to add .cpu() now. Thank you!

np man glad i could help, i never had that other issue you are now having so cant help with that

Ok, I got it! In train_wavernn.py, line 129:
Change if np.isnan(grad_norm):
to if np.isnan(grad_norm.cpu()):

It's training now. Thank you @Ahmad21A for the help!

oh yeaaa the isnan thing, i just deleted that line lol, idk anything about python but i figured out how to run this repo by sheer luck