fatchord / WaveRNN

WaveRNN Vocoder + TTS

Home Page:https://fatchord.github.io/model_outputs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why n_FFT != win_length

macarbonneau opened this issue · comments

I was wondering if there was a reason to use a smaller window than the frame length. Is this by design or an overlooked detail?

@macarbonneau I'm following the 12.5ms hop, 50ms window described in the tacotron paper to the letter. And since I'm used to doing stft with powers of 2 I just went with that. It doesn't cause any problems afaik.

oh ok! if ever, we used 1024 (46 ms) and it work just fine. Anyways, I guess it does not hurt to use 2048 samples (as long as the window is centered around the time step mark), you just get more spectral resolution than the windowed signal can express.

Thank you for the quick response and more importantly, thank you for the nice repo!

I discussed with a colleague and he confirmed what I was worried about: If you zero padding to your signal, you are in fact introducing a lot of distortion in your spectrum. As your results are good in any case, this is not too problematic, but I expect that you could get better quality by removing zero padding. You can use nFFT of 1100, it won't slow down the algorithm too much ...or at all.

Thx again!