DigitalPhonetics / IMS-Toucan

Hi,

Out of curiosity, I want to test BigVGan. On their page they say that it accepts .npy as input. I browsed the code but could not find where the Mel Spectrogram is generated.

Could you please show me the line of code that I can save to use BigVGan (manually) ?

Thanks in advance for your help

The next release will include BigVGAN, it's already in one of the experimental branches. It works extremely well, especially when it's paired with the discriminators that Avocodo adds. But it also very slow unfortunately.

Here are the spectrograms:

IMS-Toucan/InferenceInterfaces/PortaSpeechInterface.py

Line 185 in e41e266

mel = mel.transpose(0, 1)

I'm not sure their spectrogram settings are the same as ours though, so not sure if their model will work out of the box with outputs from this TTS.

Thank you, will give this a try!

Where can I intercept the Mel Spectrogram to save it as .npy ?