Audio buffer and Padding size problems

Question

Audio buffer and Padding size problems

teinhonglo opened this issue 4 years ago · comments

Hi,
I am trying to train pase model from scratch and I get the following two errors, Audio buffer is not finite everywhere and Padding size should be less than the corresponding input dimension , while training the model.
To fix the first problem, I tried to add np.nan_to_num(y) before the 706th, but I think this trial is not a good solution.
I have no idea to two problems.
Any suggestion?

Audio buffer is not finite everywhere

Traceback (most recent call last):
File "train.py", line 465, in
train(opts)
File "train.py", line 333, in train
Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
File "/home/teinhonglo/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_
batch = next(iterator)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
librosa.util.exceptions.ParameterError: Caught ParameterError in DataLoader worker process 8.
Original Traceback (most recent call last):
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/teinhonglo/pase/pase/dataset.py", line 492, in getitem
pkg = self.transform(pkg)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 70, in call
img = t(img)
File "/home/teinhonglo/pase/pase/transforms.py", line 706, in call
hop_length=self.hop,
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/feature/spectral.py", line 1442, in mfcc
S = power_to_db(melspectrogram(y=y, sr=sr, **kwargs))
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/feature/spectral.py", line 1531, in melspectrogram
power=power)
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/core/spectrum.py", line 1557, in _spectrogram
S = np.abs(stft(y, n_fft=n_fft, hop_length=hop_length))**power
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/core/spectrum.py", line 161, in stft
util.valid_audio(y)
File "/usr/local/bin/.local/lib/python3.5/site-packages/librosa/util/utils.py", line 170, in valid_audio
raise ParameterError('Audio buffer is not finite everywhere')
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

Padding size should be less than the corresponding input dimension

Epoch 0/10: 5%|#####3 | 242/5205 [05:53<3:40:07, 2.66s/it]
Traceback (most recent call last):
File "train.py", line 465, in
train(opts)
File "train.py", line 333, in train
Trainer.train_(dloader, device=device, valid_dataloader=va_dloader)
File "/home/teinhonglo/pase/pase/models/WorkerScheduler/trainer.py", line 223, in train_
batch = next(iterator)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/teinhonglo/.local/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/teinhonglo/pase/pase/dataset.py", line 492, in getitem
pkg = self.transform(pkg)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torchvision/transforms/transforms.py", line 70, in call
img = t(img)
File "/home/teinhonglo/pase/pase/transforms.py", line 427, in call
pkg['chunk_rand'] = self.select_chunk(raw_rand)
File "/home/teinhonglo/pase/pase/transforms.py", line 317, in select_chunk
mode=self.pad_mode).view(-1)
File "/usr/local/bin/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 2868, in pad
return torch._C._nn.reflection_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (0, 28656) at dimension 2 of input [1, 1, 3344]

Shruti Mittal · Answer 1 · Wed Mar 11 2020 14:47:38 GMT+0800 (China Standard Time)

Hey did you segment your data? I think i got a similar error when I didnt

Tien-Hong Lo · Answer 2 · Thu Mar 12 2020 17:26:48 GMT+0800 (China Standard Time)

Hey did you segment your data? I think i got a similar error when I didnt

Did you mean that segment data to train/valid/test set?
I write train/valid to train.scp, test to test.scp and I create a symbolic links to train/valid/test wavs in data/wavs.

Shruti Mittal · Answer 3 · Fri Mar 13 2020 17:38:35 GMT+0800 (China Standard Time)

No, check the script at /data/prep/prepare_segmented_dataset_libri.py

Polina Turishcheva · Answer 4 · Tue Jan 12 2021 19:43:54 GMT+0800 (China Standard Time)

For me using soundfile.read instead of torchaudio.load solved the issue with paddings (I used .ogg files, not wavs)

uuwz · Answer 5 · Thu Sep 07 2023 20:17:42 GMT+0800 (China Standard Time)

Hello! I have been replicating this experiment recently, but during the process of making the dataset config file, do I know where to obtain these files. (-- train_scp data/LibriSpeed/libri_tr.scp -- test_scp data/LibriSpeed/libri_te.scp\

--Libri_ Dict data/LibriSpeed/Libri_ Dict. npy). I look forward to your reply very much. Thank you.