multipitch_architectures

This is a pytorch code repository accompanying the following paper:

Christof Weiß and Geoffroy Peeters
Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings
IEEE/ACM Transactions on Audio, Speech & Language Processing, 2022
https://ieeexplore.ieee.org/document/9865174

This repository only contains exemplary code and pre-trained models for most of the paper's experiments as well as some individual examples. All datasets used in the paper are publicly available (at least partially):

For details and references, please see the paper.

In addition, we provide information on version duplicates in MusicNet (MusicNet_stats.md) and detailed information on the different training-test splits used in our experiments (as JSON and Markdown files in folder dataset_splits).

Feature extraction and prediction (Jupyter notebooks)

In this top folder, two Jupyter notebooks (01_precompute_features and 02_predict_with_pretrained_model) demonstrate how to preprocess audio files for running our models and how to load a pretrained model for predicting pitches.

Experiments from the paper (Python scripts)

In the experiments folder, all experimental scripts as well as the log files (subfolder logs) and the filewise results (subfolder results_filewise) can be found. The folder models_pretrained contains pre-trained models for the main experiments. The subfolder predictions contains exemplary model predictions for two of the experiments. Plese note that re-training requires a GPU as well as the pre-processed training data (see the notebook 01_precompute_features for an example). Any script must be started from the repository top folder path in order to get the relative paths working correctly.

The experiment files' names relate to the paper's results in the following way:

Exp1_SectionIV-B

Experiments from Section IV.B (Table II / Fig. 4) - Model Architectures and Sizes. Suffix __ rerun denotes additional training/test runs of a model.

(a) CNN (simple)

CNN:XS exp126a_musicnet_cnn_basic
CNN:S exp126b_musicnet_cnn_wide
CNN:M exp126c_musicnet_cnn_verywide
CNN:L exp126d_musicnet_cnn_extremelywide

(b) DCNN (deep)

DCNN:S exp127a_musicnet_cnn_deepbasic
DCNN:M exp127b_musicnet_cnn_deepwide
DCNN:L exp127c_musicnet_cnn_deepverywide

(c) DRCNN (deep residual)

DRCNN:S exp128a_musicnet_cnn_deepresnetbasic
DRCNN:M exp128b_musicnet_cnn_deepresnetwide
DRCNN:L exp128c_musicnet_cnn_deepresnetverywide
— exp128c_musicnet_cnn_deepresnetverywide_rerun1
— exp128c_musicnet_cnn_deepresnetverywide_rerun2

(d) Unet

Unet:S exp160d2_musicnet_unet_large_bugfix
Unet:M exp160g_musicnet_unet_medium_bugfix
— exp160g_musicnet_unet_medium_bugfix_rerun1
— exp160g_musicnet_unet_medium_bugfix_rerun2
Unet:L exp160e3_musicnet_unet_verylarge_bugfix_scaled
— exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun1
— exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun2
Unet:XL exp160f_musicnet_unet_veryverylarge
— exp160f_musicnet_unet_veryverylarge_rerun1
— exp160f_musicnet_unet_veryverylarge_rerun2

(e) SAUnet (self-attention at bottleneck)

SAUnet:M exp180b_musicnet_unet_verylarge_doubleselfattn
SAUnet:L exp180d_musicnet_unet_extremelylarge_doubleselfattn
— exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun1
— exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun2
— exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun3
— exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun4
SAUnet:XL exp180e_musicnet_unet_insanelylarge_doubleselfattn
— exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun1
— exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun2
SAUnet:XXL exp180f_musicnet_unet_intermedlarge_doubleselfattn
— exp180f_musicnet_unet_intermedlarge_doubleselfattn_rerun

(f) SAUSnet (self-attention also at lowest skip connection)

SAUSnet:M exp181b_musicnet_unet_verylarge_doubleselfattn_twolayers
SAUSnet:L exp181d_musicnet_unet_verylarge_doubleselfattn_twolayers
SAUSnet:XL exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers
— exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun1
— exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun2
SAUSnet:XXL exp181e_musicnet_unet_insanelylarge_doubleselfattn_twolayers

(g) BLUnet (BiLSTM at bottleneck)

BLUnet:M exp186b_musicnet_unet_verylarge_blstm
BLUnet:L exp186d_musicnet_unet_extremelylarge_blstm
BLUnet:XXL exp186e_musicnet_unet_insanelylarge_blstm

(h) PUnet (multi-task with degree-of-polyphony estimation)

PUnet:M exp195g_musicnet_unet_extremelylarge_polyphony_softmax
PUnet:L exp195e3_musicnet_unet_extremelylarge_polyphony_softmax
PUnet:XL exp195f_musicnet_unet_extremelylarge_polyphony_softmax
— exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
— exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2

Exp2_SectionIV-C

Experiments from Section IV.C (Table IV) - Model Generalization (more training samples, other testsets). Suffix __ rerun denotes additional training/test runs of a model.

(a) Test set MuN-10a (more training samples)

Unet:XL exp160f_musicnet_unet_veryverylarge_moresamples
— exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
— exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
SAUnet:L exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
— exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
— exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
SAUSnet:XL exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
PUnet:XL exp195f_musicnet_unet_extremelylarge_polyphony_softmax_moresamples

(b) Test set MuN-10 (original)

Unet:XL RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples
— RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
— RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
SAUnet:L RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
— RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
— RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
SAUSnet:XL RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
— RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun1
— RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun2
PUnet:XL RETRAIN_exp195f_musicnet_unet_extremelylarge_polyphony_softmax

(c) Test set MuN-3 (90s)

see models from (a) Test set MuN-10a

(d) Test set MuN-10b (slow movements)

SAUnet:L RETRAIN2_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples

(e) Test set MuN-10c (fast movements)

SAUnet:L RETRAIN3_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples

(f) Test set MuN-10full (all movements of the ten work cycles)

CNN:M RETRAIN4_exp127c_musicnet_cnn_verywide_moresamples
DRCNN:L RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples
— RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun1
— RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun2
Unet:M RETRAIN4_exp160f_musicnet_unet_veryverylarge_moresamples
Unet:XL RETRAIN4_exp160g_musicnet_unet_medium_moresamples
SAUnet:L RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
— RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
— RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
SAUSnet:XL RETRAIN4_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
BLUnet:L RETRAIN4_exp186d_musicnet_unet_extremelylarge_blstm_moresamples
PUnet:XL RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax
— RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
— RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2

Exp3_SectionIV-D

Experiments from Section IV.D (Fig. 6) - Cross-Version Study on Schubert Winterreise.

CNN:M

Version split: exp200a_schubert_versionsplit_cnn_verywide
Song split: exp200b_schubert_songsplit_cnn_verywide
Neither split: exp200c_schubert_neithersplit_cnn_verywide

SAUnet:L

Version split: exp201a_schubert_versionsplit_unet_extremelylarge_doubleselfattn
Song split: exp201b_schubert_songsplit_unet_extremelylarge_doubleselfattn
Neither split: exp201c_schubert_neithersplit_unet_extremelylarge_doubleselfattn

Exp4_SectionIV-E

Experiments from Section IV.E (Fig. 7) - Cross-Dataset Study on Big Mix Dataset, compiled from all source datasets. Suffix __ rerun denotes additional training/test runs of a model.

CNN:M exp216c_bigmix_cnn_verywide
— exp216c_bigmix_cnn_verywide_rerun1
— exp216c_bigmix_cnn_verywide_rerun2
DRCNN:L exp214c_bigmix_cnn_deepresnetwide
— exp214c_bigmix_cnn_deepresnetwide_rerun1
— exp214c_bigmix_cnn_deepresnetwide_rerun2
Unet:M exp213g_bigmix_unet_medium
— exp213g_bigmix_unet_medium_rerun1
— exp213g_bigmix_unet_medium_rerun2
Unet:XL exp212f_bigmix_unet_veryverylarge
— exp212f_bigmix_unet_veryverylarge_rerun1
— exp212f_bigmix_unet_veryverylarge_rerun2
SAUnet:L exp210d_bigmix_unet_extremelylarge_doubleselfattn
— exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun1
— exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun2
SAUSnet:XL exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers
— exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun1
— exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun2
BLUnet:L exp217d_bigmix_unet_extremelylarge_blstm
— exp217d_bigmix_unet_extremelylarge_blstm_rerun1
— exp217d_bigmix_unet_extremelylarge_blstm_rerun2
PUnet:XL exp215f_bigmix_unet_extremelylarge_polyphony_softmax
— exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun1
— exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun2

Run scripts using e.g. the following commands:
conda activate multipitch_architectures
export CUDA_VISIBLE_DEVICES=1
python experiments/Exp1_SectionIV-B/exp126a_musicnet_cnn_basic.py

christofw / multipitch_architectures