This codebase provide a minimal Pytorch tutorial of music source separation, including examples of audio data loader, models, training, and inference pipelines.
git clone https://github.com/qiuqiangkong/mini_mss
# Install Python environment
conda create --name mss python=3.8
conda activate mss
# Install Python packages dependencies.
sh env.sh
MUSDB18HQ is a music dataset with individual vocals, bass, drums, and other stems of music. We provide a mini version which contains 1% of the full dataset for users to quickly run the code.
mkdir datasets
cd datasets
wget -O mini_musdb18hq.tar https://huggingface.co/datasets/qiuqiangkong/mini_audio_datasets/resolve/main/mini_musdb18hq.tar?download=true
tar -xvf mini_musdb18hq.tar
cd ..
Users can visit https://zenodo.org/records/3338373 to download the full MUSDB18HQ dataset
python train.py
After training on a single GPU for a few minutes, users can use the "latest.pth" checkpoint to inference their favorite songs.
python inference.py
After training for around 10 - 20 minutes (20 epochs), the vocal SDR will improve from -6 dB to around 2 dB.
[1] Qiuqiang Kong, Yin Cao, Haohe Liu, Keunwoo Choi, and Yuxuan Wang. "Decoupling magnitude and phase estimation with deep resunet for music source separation." arXiv preprint arXiv:2109.05418 (2021).