ZFTurbo / MVSEP-MDX23-music-separation-model

Model for MDX23 music separation contest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about multiple inferences in decums model

owlwang opened this issue · comments

First of all, thank you for creating this impressive project! The work you've done is truly remarkable.

I had a question while looking through the codebase. I noticed that the decums model is called twice during the inference process.

vocals_demucs = 0.5 * apply_model(model, audio, shifts=shifts, overlap=overlap)[0][3].cpu().numpy()

vocals_demucs += 0.5 * -apply_model(model, -audio, shifts=shifts, overlap=overlap)[0][3].cpu().numpy()

Is there a specific reason that two separate calls are required? I'm curious if this is some sort of optimization trick or if there is another technical motivation behind it.

Thanks again for building such an amazing tool!

This technique is called "Test Time Augmentation". We predict on original audio and mirrored audio and then average the result. It usually gives slightly better and less noisy result.

Thanks!