musdb-XL

This is the official repository of our paper "Towards robust music source separation on loud commercial music" in ISMIR 2022.

Specifically, this repo contains the code for creating the musdb-XL (and musdb-L) evaluation datasets for music source separation.

musdb-XL is an eXtremely Loud version of musdb-hq evaluation dataset.

If you are interested in LimitAug augmentation method that we also proposed in our paper, check the repo below.

LimitAug

TL;DR

Backgrounds (Mastering and Limiter)

Mastering is a final process of music production (excluding a publication), and a limiter, a high-ratio dynamic range compressor, is a core tool for achieving heavy music loudness in mastering. It squashes the audio waveform, and reduce the dynamic range of music so that the music can be heard loud enough keeping waveform vaules under 0 decibels relative to Full Scale (dBFS). If you are interested in a relationship between music production industry and a limiter, take a look at the brief history of 'loudness war' in wiki page (https://en.wikipedia.org/wiki/Loudness_war).

Motivation (musdb-XL)

Musdb18 is probably the most widely used benchmark dataset for music source separation task. However, we thought that there is a huge gap between the musdb data and the real world commercial music, specifically from perspective of 'music loudness'. As can be seen in figure below, the difference on music loudness between commercial pop music and the training examples are huge (we used training examples generated by the dataloader from the official implementation of Open-unmix). However, because the musdb evaluation subset also have much smaller loudness than recent pop music, the performance degradation comes from the domain mismatch could hardly be confirmed.

Therefore, we propose musdb-XL, a limiter applied evaluation subset of musdb-hq. The proposed dataset has comparable music loudness with recent pop music!

How we made musdb-XL

Personally, I've been digging deep into music production in recent years, especially for music mixing and mastering. Getting to know the process of how music is made, how to use music effect plug-ins and DAW, and most importantly, how to listen, was a great opportunity for me as an MIR researcher myself. Fortunately, I could publish my very first self-produced single after all. Please come listen and press like and subscribe! (Spotify, Apple music, YouTube, YouTube music)

Back to the point, for making the musdb-XL data, we first applied iZotope Ozone 9 Maximizer (Elements version) to all of the musdb-hq songs. All of the parameters were chosen manually, I tweaked the amount of limiter thresholds, listened, and controlled the release parameters with IRC 2 mode.

After making musdb-XL only on mixtures, then we made the musdb-XL's stems. As you can see in figure above, a limiter is a time-varying volume controller actually (a blue line indicates the amount of gain reduction). So we calculated the sample-wise (element-wise) level differences (ratio) between the original musdb-hq and musdb-XL for each song. Then applied the ratio to each stem of musdb-hq to get the ground truth stem of musdb-XL.

How to use musdb-XL for the evaluation

First, download the sample-wise (element-wise) ratio between musdb-XL (or L) and musdb-hq files from Zenodo. If you want to reproduce our experiments, download both musdb-L and musdb-XL ratio data. But if you just want to use the data as a benchmark for robust music source separation application, we recommend you to use just musdb-XL only because musdb-XL has more similar characteristics and loudness to real world commercial music. Musdb-L was originally made in an experimental reason, to check the tendency between the model performance and overall loudness.

Download page

Then, unzip the files, run the make_musdb_L_and_XL.py from this reposiroty.

Note that path --L_XL_ratio_root argument should contain unzipped ratio folder like below.

/where/downloaded/data/is/musdb_XL_ratio (and additional /where/downloaded/data/is/musdb_L_ratio)

python make_L_and_XL.py \
  --save_dir=/where/to/save/musdb_XL \
  --musdb_hq_root=/path/where/original/musdb-hq (of course, musdb-hq must be downloaded. check [this page](https://sigsep.github.io/datasets/musdb.html))\
  --L_XL_ratio_root=/where/downloaded/data/is \
  --only_XL=True (use True if you want to make only musdb-XL, default is False (both musdb-L and XL will be made))

We know that this is a cumbersome process but due to the copyright reason, we were unable to make musdb-L and XL downloadable directly.

After making the dataset you can just use musdb-XL for the evaluation of your model with musdb library. Just change the root folder when you load the musdb class and use it.

import musdb
mus_test = musdb.DB(root='/path/where/musdb_XL', subsets="test", is_wav=True, sample_rate=44100)

Notes

Q : As you can find in Table 1 of our paper, musdb-XL is still 0.6 LUFS quieter than commericial music. Why is that?

A : Actually we could get more loudness by pushing a limiter harder, but it made music sound very distorted and unnatural. Music mastering is not just one process of applying a limiter. To achieve a high loudness with natural sounding results, other processors such as an equalizers (EQ) are required. However, since our work only focuses on the effect of limiter, we did not consider any other process. Also, we thought 0.6 LUFS are quite small difference. Though the list varies from a month to month, Songs in Tidal's Pop life playlist that we investigated in our paper are music from famous artists that you might already heard of. In my personal, subjective opinion as a music lover and listener, it seems that well-known producers and artists want their music to be louder than others'. That is, there is a possibility that the sample mean LUFS of our investigation might be bigger than that of population mean.

Q : A sum of stems is not exactly same as a mixture. Why is that?

A : It is because there are clipping errors when saving the stems as pcm 16bit wave files. When calculating the ground truth stems of musdb-XL, there can be waveform values that exceeds 0 dBFS (1 or -1 in float), though there are not such values in musdb-XL mixture. Saving such stem waveforms as a pcm 16bit wave forcibly clip this, which results in the inconsistency between a sum of stems and a mixture. You can hear the tick (or pop) noise (to me, it's like a noise of old vinyl (LP)) in some regions of stems. The process such as 'simple linear gain decreasing (roughly by -6dB (multilpy 0.5 to waveforms)) - saving - load when evaluation then restore the original gain (+6dB)' can solve the problem, but we thought the error is trivial and negligible to consider such a complex and unintuitive process.

Q : What is 'un-musical' means in Section 4.2 in the paper?

A : A limiter is generally triggered by the audio level calculated from the average of two audio channels (left and right). Let's assume that the input signal of a limiter is panned to the right channel like 'PR - Oh No' track. Then, the limiter will reduce gain of the left channel, mainly triggered by the right channel, though the left channel's energy is actually not loud enough to be reduced. This is an un-musical example of using a limiter.

Acknowledgements

We appreciate Zafar Rafii, the author of musdb, for allowing us to reprocess the original musdb data. We thank Antoine Liutkus, also the author of musdb18, for giving the creative suggestion on the distribution of our proposed datasets. We are grateful to Ben Sangbae Chon, Keunwoo Choi, and Hyeongi Moon from GaudioLab, Inc. for their fruitful discussions on our proposed methods. Last but not least, we thank Donmoon Lee, Juheon Lee, Jaejun Lee, Junghyun Koo, and Sungho Lee for helpful feedbacks.

References

[1] Rafii, Zafar, et al. MUSDB18. (https://sigsep.github.io/datasets/musdb.html)

[2] Stöter, Fabian-Robert, et al. Python parser and tools for MUSDB18 Music Separation Dataset. (https://github.com/sigsep/sigsep-mus-db)

jeonchangbin49 / musdb-XL