fatchord / WaveRNN

WaveRNN Vocoder + TTS

Home Page:https://fatchord.github.io/model_outputs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unaligned fades in batched generation

742617000027 opened this issue · comments

During investigating possible causes for #149, we came across a potential bug in the way batched audio is concatenated in fatchord_version.py's xfade_and_unfold() function. The following figures show the shape of the fades that are applied to each batch entry before concatenation:

Fade-In
image

Fade-Out
image

Squared fades applied to impulse train
image

Due to the way the fades are constructed, the ramps are not aligned and the signals are not mixed with equal power. Instead, two successive zero values are introduced in the output signal within the overlap region between batch entries. In the following figures, vertical red lines mark zero values (i.e. exactly 0) in the signal:

Zeros on borders between batch entries
0_Samples_Full

Close up
0_Samples_Close

To fix the issue, we changed the way fade_out is constructed to the following:

silence_len = overlap // 2
fade_len = overlap - silence_len
linear = np.ones((silence_len), dtype=np.float64)
t = np.linspace(-1, 1, fade_len, dtype=np.float64)
fade_out = np.sqrt(0.5 * (1 - t))
fade_out = np.concatenate([linear, fade_out])

This assumes that preceding silence is only needed for the fade-in, to account for the RNN warmup. It results in the following fade-out:

New fade-out
image

Squared fades (old fade-in and new fade-out) applied to impulse train
image

@742617000027 I think you're right! I'll test this out. Thanks for the heads up

That's fixed now in 544cd5d