Question about multiple inferences in decums model
owlwang opened this issue · comments
owlwang commented
First of all, thank you for creating this impressive project! The work you've done is truly remarkable.
I had a question while looking through the codebase. I noticed that the decums model is called twice during the inference process.
vocals_demucs = 0.5 * apply_model(model, audio, shifts=shifts, overlap=overlap)[0][3].cpu().numpy()
vocals_demucs += 0.5 * -apply_model(model, -audio, shifts=shifts, overlap=overlap)[0][3].cpu().numpy()
Is there a specific reason that two separate calls are required? I'm curious if this is some sort of optimization trick or if there is another technical motivation behind it.
Thanks again for building such an amazing tool!
Roman Solovyev commented
This technique is called "Test Time Augmentation". We predict on original audio and mirrored audio and then average the result. It usually gives slightly better and less noisy result.
owlwang commented
Thanks!