fatchord / WaveRNN

WaveRNN Vocoder + TTS

Home Page:https://fatchord.github.io/model_outputs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Max clipping when normalizing the spectrogram

macarbonneau opened this issue · comments

Hello!
I am still going over the spectrogram extraction part of the code and I found something interesting. It seems that you are clipping the minimum value of the spectrogram as well as the maximum value during normalization.

def normalize(S):
return np.clip((S - min_level_db) / -min_level_db, 0, 1)

When you extract FFT, the amplitude extracted is not normalized by the window size. This is standard in DFT (numpy and scipy). This means that for a temporal signal normalize between -1 and +1, it is possible to get amplitude values in you bins as high as 1000. This extreme cas does not happen often. However value around 50 are quite common. An amplitude of 50 translates into 33 dB. This means that after nomalization, as above, you get (33- (-100))/-(-100) = 1.33. It is then clipped to 1.

Now, I wonder is this a bug or a feature? Maybe it helps the network generalize better since it is a bit like noise. I will make some experiments and let you know if I can conclude something.

So that's it! I justed wanted to let you know :)

Best,
Marc-Andre