Question about multiple inferences in decums model

Question

Question about multiple inferences in decums model

owlwang opened this issue a year ago · comments

First of all, thank you for creating this impressive project! The work you've done is truly remarkable.

I had a question while looking through the codebase. I noticed that the decums model is called twice during the inference process.

vocals_demucs = 0.5 * apply_model(model, audio, shifts=shifts, overlap=overlap)[0][3].cpu().numpy()

vocals_demucs += 0.5 * -apply_model(model, -audio, shifts=shifts, overlap=overlap)[0][3].cpu().numpy()

Is there a specific reason that two separate calls are required? I'm curious if this is some sort of optimization trick or if there is another technical motivation behind it.

Thanks again for building such an amazing tool!

owlwang commented a year ago

Thanks!

Roman Solovyev · Answer 1 · Thu Sep 21 2023 16:24:37 GMT+0800 (China Standard Time)

This technique is called "Test Time Augmentation". We predict on original audio and mirrored audio and then average the result. It usually gives slightly better and less noisy result.