christofw / multipitch_architectures

Pytorch project accompanying the paper "Towards Improved Multi-Pitch Estimation: Large Architectures and the Challenge of Evaluation", submitted to IEEE/ACM Transactions on Audio, Speech & Language Processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multipitch_architectures

This is a pytorch code repository accompanying the following paper:

Christof Weiß and Geoffroy Peeters
Comparing Deep Models and Evaluation Strategies for Multi-Pitch Estimation in Music Recordings
IEEE/ACM Transactions on Audio, Speech & Language Processing, 2022
https://ieeexplore.ieee.org/document/9865174

This repository only contains exemplary code and pre-trained models for most of the paper's experiments as well as some individual examples. All datasets used in the paper are publicly available (at least partially):

For details and references, please see the paper.

In addition, we provide information on version duplicates in MusicNet (MusicNet_stats.md) and detailed information on the different training-test splits used in our experiments (as JSON and Markdown files in folder dataset_splits).

Feature extraction and prediction (Jupyter notebooks)

In this top folder, two Jupyter notebooks (01_precompute_features and 02_predict_with_pretrained_model) demonstrate how to preprocess audio files for running our models and how to load a pretrained model for predicting pitches.

Experiments from the paper (Python scripts)

In the experiments folder, all experimental scripts as well as the log files (subfolder logs) and the filewise results (subfolder results_filewise) can be found. The folder models_pretrained contains pre-trained models for the main experiments. The subfolder predictions contains exemplary model predictions for two of the experiments. Plese note that re-training requires a GPU as well as the pre-processed training data (see the notebook 01_precompute_features for an example). Any script must be started from the repository top folder path in order to get the relative paths working correctly.

The experiment files' names relate to the paper's results in the following way:

Exp1_SectionIV-B

Experiments from Section IV.B (Table II / Fig. 4) - Model Architectures and Sizes. Suffix __ rerun denotes additional training/test runs of a model.

(a) CNN (simple)

  • CNN:XSexp126a_musicnet_cnn_basic
  • CNN:Sexp126b_musicnet_cnn_wide
  • CNN:Mexp126c_musicnet_cnn_verywide
  • CNN:Lexp126d_musicnet_cnn_extremelywide

(b) DCNN (deep)

  • DCNN:Sexp127a_musicnet_cnn_deepbasic
  • DCNN:Mexp127b_musicnet_cnn_deepwide
  • DCNN:Lexp127c_musicnet_cnn_deepverywide

(c) DRCNN (deep residual)

  • DRCNN:Sexp128a_musicnet_cnn_deepresnetbasic
  • DRCNN:Mexp128b_musicnet_cnn_deepresnetwide
  • DRCNN:Lexp128c_musicnet_cnn_deepresnetverywide
  •   —  exp128c_musicnet_cnn_deepresnetverywide_rerun1
  •   —  exp128c_musicnet_cnn_deepresnetverywide_rerun2

(d) Unet

  • Unet:Sexp160d2_musicnet_unet_large_bugfix
  • Unet:Mexp160g_musicnet_unet_medium_bugfix
  •   —  exp160g_musicnet_unet_medium_bugfix_rerun1
  •   —  exp160g_musicnet_unet_medium_bugfix_rerun2
  • Unet:Lexp160e3_musicnet_unet_verylarge_bugfix_scaled
  •   —  exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun1
  •   —  exp160e3_musicnet_unet_verylarge_bugfix_scaled_rerun2
  • Unet:XLexp160f_musicnet_unet_veryverylarge
  •   —  exp160f_musicnet_unet_veryverylarge_rerun1
  •   —  exp160f_musicnet_unet_veryverylarge_rerun2

(e) SAUnet (self-attention at bottleneck)

  • SAUnet:Mexp180b_musicnet_unet_verylarge_doubleselfattn
  • SAUnet:Lexp180d_musicnet_unet_extremelylarge_doubleselfattn
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun1
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun2
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun3
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_rerun4
  • SAUnet:XLexp180e_musicnet_unet_insanelylarge_doubleselfattn
  •   —  exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun1
  •   —  exp180e_musicnet_unet_insanelylarge_doubleselfattn_rerun2
  • SAUnet:XXLexp180f_musicnet_unet_intermedlarge_doubleselfattn
  •   —  exp180f_musicnet_unet_intermedlarge_doubleselfattn_rerun

(f) SAUSnet (self-attention also at lowest skip connection)

  • SAUSnet:Mexp181b_musicnet_unet_verylarge_doubleselfattn_twolayers
  • SAUSnet:Lexp181d_musicnet_unet_verylarge_doubleselfattn_twolayers
  • SAUSnet:XLexp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers
  •   —  exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun1
  •   —  exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_rerun2
  • SAUSnet:XXLexp181e_musicnet_unet_insanelylarge_doubleselfattn_twolayers

(g) BLUnet (BiLSTM at bottleneck)

  • BLUnet:Mexp186b_musicnet_unet_verylarge_blstm
  • BLUnet:Lexp186d_musicnet_unet_extremelylarge_blstm
  • BLUnet:XXLexp186e_musicnet_unet_insanelylarge_blstm

(h) PUnet (multi-task with degree-of-polyphony estimation)

  • PUnet:Mexp195g_musicnet_unet_extremelylarge_polyphony_softmax
  • PUnet:Lexp195e3_musicnet_unet_extremelylarge_polyphony_softmax
  • PUnet:XLexp195f_musicnet_unet_extremelylarge_polyphony_softmax
  •   —  exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
  •   —  exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2

Exp2_SectionIV-C

Experiments from Section IV.C (Table IV) - Model Generalization (more training samples, other testsets). Suffix __ rerun denotes additional training/test runs of a model.

(a) Test set MuN-10a (more training samples)

  • Unet:XLexp160f_musicnet_unet_veryverylarge_moresamples
  •   —  exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
  •   —  exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
  • SAUnet:Lexp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
  •   —  exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
  • SAUSnet:XLexp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
  • PUnet:XLexp195f_musicnet_unet_extremelylarge_polyphony_softmax_moresamples

(b) Test set MuN-10 (original)

  • Unet:XLRETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples
  •   —  RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun1
  •   —  RETRAIN_exp160f_musicnet_unet_veryverylarge_moresamples_rerun2
  • SAUnet:LRETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
  •   —  RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
  •   —  RETRAIN_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
  • SAUSnet:XLRETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
  •   —  RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun1
  •   —  RETRAIN_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples_rerun2
  • PUnet:XLRETRAIN_exp195f_musicnet_unet_extremelylarge_polyphony_softmax

(c) Test set MuN-3 (90s)

  • see models from (a) Test set MuN-10a

(d) Test set MuN-10b (slow movements)

  • SAUnet:LRETRAIN2_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples

(e) Test set MuN-10c (fast movements)

  • SAUnet:LRETRAIN3_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples

(f) Test set MuN-10full (all movements of the ten work cycles)

  • CNN:MRETRAIN4_exp127c_musicnet_cnn_verywide_moresamples
  • DRCNN:LRETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples
  •   —  RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun1
  •   —  RETRAIN4_exp128c_musicnet_cnn_deepresnetwide_moresamples_rerun2
  • Unet:MRETRAIN4_exp160f_musicnet_unet_veryverylarge_moresamples
  • Unet:XLRETRAIN4_exp160g_musicnet_unet_medium_moresamples
  • SAUnet:LRETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples
  •   —  RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun1
  •   —  RETRAIN4_exp180d_musicnet_unet_extremelylarge_doubleselfattn_moresamples_rerun2
  • SAUSnet:XLRETRAIN4_exp181f_musicnet_unet_intermedlarge_doubleselfattn_twolayers_moresamples
  • BLUnet:LRETRAIN4_exp186d_musicnet_unet_extremelylarge_blstm_moresamples
  • PUnet:XLRETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax
  •   —  RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun1
  •   —  RETRAIN4_exp195f_musicnet_unet_extremelylarge_polyphony_softmax_rerun2

Exp3_SectionIV-D

Experiments from Section IV.D (Fig. 6) - Cross-Version Study on Schubert Winterreise.

CNN:M

  • Version split:exp200a_schubert_versionsplit_cnn_verywide
  • Song split:exp200b_schubert_songsplit_cnn_verywide
  • Neither split:exp200c_schubert_neithersplit_cnn_verywide

SAUnet:L

  • Version split:exp201a_schubert_versionsplit_unet_extremelylarge_doubleselfattn
  • Song split:exp201b_schubert_songsplit_unet_extremelylarge_doubleselfattn
  • Neither split:exp201c_schubert_neithersplit_unet_extremelylarge_doubleselfattn

Exp4_SectionIV-E

Experiments from Section IV.E (Fig. 7) - Cross-Dataset Study on Big Mix Dataset, compiled from all source datasets. Suffix __ rerun denotes additional training/test runs of a model.

  • CNN:Mexp216c_bigmix_cnn_verywide
  •   —  exp216c_bigmix_cnn_verywide_rerun1
  •   —  exp216c_bigmix_cnn_verywide_rerun2
  • DRCNN:Lexp214c_bigmix_cnn_deepresnetwide
  •   —  exp214c_bigmix_cnn_deepresnetwide_rerun1
  •   —  exp214c_bigmix_cnn_deepresnetwide_rerun2
  • Unet:Mexp213g_bigmix_unet_medium
  •   —  exp213g_bigmix_unet_medium_rerun1
  •   —  exp213g_bigmix_unet_medium_rerun2
  • Unet:XLexp212f_bigmix_unet_veryverylarge
  •   —  exp212f_bigmix_unet_veryverylarge_rerun1
  •   —  exp212f_bigmix_unet_veryverylarge_rerun2
  • SAUnet:Lexp210d_bigmix_unet_extremelylarge_doubleselfattn
  •   —  exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun1
  •   —  exp210d_bigmix_unet_extremelylarge_doubleselfattn_rerun2
  • SAUSnet:XLexp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers
  •   —  exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun1
  •   —  exp211f_bigmix_unet_intermedlarge_doubleselfattn_twolayers_rerun2
  • BLUnet:Lexp217d_bigmix_unet_extremelylarge_blstm
  •   —  exp217d_bigmix_unet_extremelylarge_blstm_rerun1
  •   —  exp217d_bigmix_unet_extremelylarge_blstm_rerun2
  • PUnet:XLexp215f_bigmix_unet_extremelylarge_polyphony_softmax
  •   —  exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun1
  •   —  exp215f_bigmix_unet_extremelylarge_polyphony_softmax_rerun2

Run scripts using e.g. the following commands:
conda activate multipitch_architectures
export CUDA_VISIBLE_DEVICES=1
python experiments/Exp1_SectionIV-B/exp126a_musicnet_cnn_basic.py

About

Pytorch project accompanying the paper "Towards Improved Multi-Pitch Estimation: Large Architectures and the Challenge of Evaluation", submitted to IEEE/ACM Transactions on Audio, Speech & Language Processing

License:Creative Commons Zero v1.0 Universal


Languages

Language:Python 67.7%Language:HTML 22.0%Language:Jupyter Notebook 10.3%