santi-pdp / pase

Problem Agnostic Speech Encoder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about distortion file

RE-N-Y opened this issue · comments

HI,

I'm trying to fine-tune PASE+ model on my own dataset, but it seems that I'm getting this error for the training script. I was able to correctly produce the stats file and .scp files with the provided python script.

Here's my output from my train.py.

[!] Using CPU Seeds initialized to 2 {'regr': [{'num_outputs': 1, 'dropout': 0, 'dropout_time': 0.0, 'hidden_layers': 1, 'name': 'cchunk', 'type': 'decoder', 'hidden_size': 64, 'fmaps': [512, 256, 128], 'strides': [4, 4, 10], 'kwidths': [30, 30, 30], 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a49d0>}, {'num_outputs': 3075, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d58bc10>, 'skip': False}, {'num_outputs': 3075, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'lps_long', 'context': 1, 'r': 7, 'transform': {'win': 512}, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4a50>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'fbank', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4a90>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'fbank_long', 'context': 1, 'r': 7, 'transform': {'win': 1024, 'n_fft': 1024}, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4ad0>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'gtn', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4b10>, 'skip': False}, {'num_outputs': 120, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'gtn_long', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4b50>, 'transform': {'win': 2048}, 'skip': False}, {'num_outputs': 39, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4b90>, 'skip': False}, {'num_outputs': 60, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mfcc_long', 'context': 1, 'r': 7, 'transform': {'win': 2048, 'order': 20}, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4bd0>, 'skip': False}, {'num_outputs': 12, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'prosody', 'context': 1, 'r': 7, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4c10>, 'skip': False}], 'cls': [{'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'mi', 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4cd0>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}, {'num_outputs': 1, 'dropout': 0, 'hidden_size': 256, 'hidden_layers': 1, 'name': 'cmi', 'augment': True, 'loss': <pase.losses.ContextualizedLoss object at 0x2b3d9d6a4d90>, 'skip': False, 'keys': ['chunk', 'chunk_ctxt', 'chunk_rand']}]} Compose( ToTensor() MIChunkWav(32000) LPS(n_fft=2048, hop=160, win=400, device=cpu) LPS(n_fft=2048, hop=160, win=512, device=cpu) FBanks(n_fft=512, n_filters=40, hop=160, win=400 FBanks(n_fft=1024, n_filters=40, hop=160, win=1024 Gammatone(f_min=500, n_channels=40, hop=160, win=400) Gammatone(f_min=500, n_channels=40, hop=160, win=2048) MFCC(order=13, sr=16000) MFCC(order=20, sr=16000) Prosody(hop=160, win=320, f0_min=60, f0_max=300, sr=16000) ZNorm(data/PARK_stats.pkl) ) Preparing dset for <MY DATASET FOLDER> Found 0 *.npy ir_files in data/omologo_revs_bin
It seems that the issue is that there is no file called omologo_revs_bin inside data? If so, is it possible to get it?

Thank you in advance!

Hi, apologies for a delay with this. We did not release these data augmentation RIRs, instead you may use the 16kHz RIRs you can get from openslr page. The results are comparable. Look at top-level README as it shows how to use them.

commented

Hello! I have been replicating this experiment recently, but during the process of making the dataset config file, do I know where to obtain these files. (-- train_scp data/LibriSpeed/libri_tr.scp -- test_scp data/LibriSpeed/libri_te.scp\

--Libri_ Dict data/LibriSpeed/Libri_ Dict. npy). I look forward to your reply very much. Thank you.