jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Home Page:https://jaywalnut310.github.io/vits-demo/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't train with fp16 on Nvidia P100

54696d21 opened this issue · comments

commented

training with fp16 doesn't work for me on a P100, I'll look into fixing it, but for future reference here is the full stacktrace
torch version 1.9.0

2021-06-29 10:29:09.537741: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
/usr/local/lib/python3.7/dist-packages/torch/functional.py:472: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at  /pytorch/aten/src/ATen/native/SpectralOps.cpp:664.)
  normalized, onesided, return_complex)
Traceback (most recent call last):
  File "train_ms.py", line 294, in <module>
    main()
  File "train_ms.py", line 50, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/content/vits/train_ms.py", line 118, in run
    train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
  File "/content/vits/train_ms.py", line 192, in train_and_evaluate
    scaler.scale(loss_gen_all).backward()
  File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf' 
commented

the problem does not occur on torch version 1.6.0 as in the requirements.txt

commented

Use nvcr.io/nvidia/pytorch:20.07-py docker image.

I have the same issue, any idea of where the complex number is generated?,
(for 1.6 it works fine but I want to combine this with another code that requires pytorch 1.9)

I got the same issue. It’s due to a bug in the pytorch STFT function for half tensor. The work around is moving the calculation of y_hat_mel in train.py outside autocast, and casting y_hat to float one line above y_hat_mel calculation.

@boltzmann-Li Can you create a PR so we can see that fix. I haven't managed to get it working following your instructions.

FYI the problem hasn't been fixed in torch 1.10.0

Is there an issue for the Complex Half problem?

@boltzmann-Li Can you create a PR so we can see that fix. I haven't managed to get it working following your instructions.

FYI the problem hasn't been fixed in torch 1.10.0

Is there an issue for the Complex Half problem?

I created a pull request. It has been working for me with 3090 GPUs and torch 1.9

I think a better way to solve this problem is to wrap the torch.stft with autocast(enabled=off) inside the mel_spectrogram_torch function. Here is the code:

def mel_spectrogram_torch(y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax, center=False):
    if torch.min(y) < -1.:
        print('min value is ', torch.min(y))
    if torch.max(y) > 1.:
        print('max value is ', torch.max(y))

    global mel_basis, hann_window
    dtype_device = str(y.dtype) + '_' + str(y.device)
    fmax_dtype_device = str(fmax) + '_' + dtype_device
    wnsize_dtype_device = str(win_size) + '_' + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(dtype=y.dtype, device=y.device)
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)

    y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
    y = y.squeeze(1)
    with autocast(enabled=False):
        y = y.float()
        spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
                        center=center, pad_mode='reflect', normalized=False, onesided=True)

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)

    spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
    spec = spectral_normalize_torch(spec)

    return spec