DigitalPhonetics / IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Where can I intercept the Mel Spectrogram to save it as .npy ?

Ca-ressemble-a-du-fake opened this issue · comments

Hi,

Out of curiosity, I want to test BigVGan. On their page they say that it accepts .npy as input. I browsed the code but could not find where the Mel Spectrogram is generated.

Could you please show me the line of code that I can save to use BigVGan (manually) ?

Thanks in advance for your help

The next release will include BigVGAN, it's already in one of the experimental branches. It works extremely well, especially when it's paired with the discriminators that Avocodo adds. But it also very slow unfortunately.

Here are the spectrograms:

mel = mel.transpose(0, 1)

I'm not sure their spectrogram settings are the same as ours though, so not sure if their model will work out of the box with outputs from this TTS.

Thank you, will give this a try!