Unaligned fades in batched generation

Question

Unaligned fades in batched generation

742617000027 opened this issue 5 years ago · comments

During investigating possible causes for #149, we came across a potential bug in the way batched audio is concatenated in fatchord_version.py's xfade_and_unfold() function. The following figures show the shape of the fades that are applied to each batch entry before concatenation:

Fade-In

Fade-Out

Squared fades applied to impulse train

Due to the way the fades are constructed, the ramps are not aligned and the signals are not mixed with equal power. Instead, two successive zero values are introduced in the output signal within the overlap region between batch entries. In the following figures, vertical red lines mark zero values (i.e. exactly 0) in the signal:

Zeros on borders between batch entries

Close up

To fix the issue, we changed the way fade_out is constructed to the following:

silence_len = overlap // 2
fade_len = overlap - silence_len
linear = np.ones((silence_len), dtype=np.float64)
t = np.linspace(-1, 1, fade_len, dtype=np.float64)
fade_out = np.sqrt(0.5 * (1 - t))
fade_out = np.concatenate([linear, fade_out])

This assumes that preceding silence is only needed for the fade-in, to account for the RNN warmup. It results in the following fade-out:

New fade-out

Squared fades (old fade-in and new fade-out) applied to impulse train

Ollie McCarthy · Answer 1 · Tue Nov 05 2019 21:32:10 GMT+0800 (China Standard Time)

@742617000027 I think you're right! I'll test this out. Thanks for the heads up

Ollie McCarthy · Answer 2 · Thu Nov 07 2019 18:27:17 GMT+0800 (China Standard Time)

That's fixed now in 544cd5d