why n_FFT != win_length

Question

why n_FFT != win_length

macarbonneau opened this issue 5 years ago · comments

Marc-André Carbonneau commented 5 years ago

I was wondering if there was a reason to use a smaller window than the frame length. Is this by design or an overlooked detail?

Ollie McCarthy · Answer 1 · Thu Nov 28 2019 16:07:07 GMT+0800 (China Standard Time)

@macarbonneau I'm following the 12.5ms hop, 50ms window described in the tacotron paper to the letter. And since I'm used to doing stft with powers of 2 I just went with that. It doesn't cause any problems afaik.

Marc-André Carbonneau · Answer 2 · Thu Nov 28 2019 23:22:17 GMT+0800 (China Standard Time)

oh ok! if ever, we used 1024 (46 ms) and it work just fine. Anyways, I guess it does not hurt to use 2048 samples (as long as the window is centered around the time step mark), you just get more spectral resolution than the windowed signal can express.

Thank you for the quick response and more importantly, thank you for the nice repo!

Marc-André Carbonneau · Answer 3 · Fri Nov 29 2019 06:38:52 GMT+0800 (China Standard Time)

I discussed with a colleague and he confirmed what I was worried about: If you zero padding to your signal, you are in fact introducing a lot of distortion in your spectrum. As your results are good in any case, this is not too problematic, but I expect that you could get better quality by removing zero padding. You can use nFFT of 1100, it won't slow down the algorithm too much ...or at all.

Thx again!