Unaligned fades in batched generation
742617000027 opened this issue · comments
During investigating possible causes for #149, we came across a potential bug in the way batched audio is concatenated in fatchord_version.py
's xfade_and_unfold()
function. The following figures show the shape of the fades that are applied to each batch entry before concatenation:
Squared fades applied to impulse train
Due to the way the fades are constructed, the ramps are not aligned and the signals are not mixed with equal power. Instead, two successive zero values are introduced in the output signal within the overlap region between batch entries. In the following figures, vertical red lines mark zero values (i.e. exactly 0) in the signal:
Zeros on borders between batch entries
To fix the issue, we changed the way fade_out
is constructed to the following:
silence_len = overlap // 2
fade_len = overlap - silence_len
linear = np.ones((silence_len), dtype=np.float64)
t = np.linspace(-1, 1, fade_len, dtype=np.float64)
fade_out = np.sqrt(0.5 * (1 - t))
fade_out = np.concatenate([linear, fade_out])
This assumes that preceding silence is only needed for the fade-in, to account for the RNN warmup. It results in the following fade-out:
Squared fades (old fade-in and new fade-out) applied to impulse train
@742617000027 I think you're right! I'll test this out. Thanks for the heads up
That's fixed now in 544cd5d