Install requirements:
$ pip install --user git+https://github.com/fgnt/lazy_dataset.git@d500d23d23c0cc2ebb874c4974b4ffa7a2418b96
$ pip install --user git+https://github.com/fgnt/paderbox.git@f0e7b0bf66a0ee6e5f51797305d84cf57227134d
$ pip install --user git+https://github.com/fgnt/padertorch.git@9985d398c10ec086e18f7525c7e7dc2809c1e7f3
Clone the repository:
$ git clone https://github.com/fgnt/pb_sed.git
Install package:
$ pip install --user -e pb_sed
Install requirements:
$ pip install --user git+https://github.com/turpaultn/DESED@2fb7fe0b4b33569ad3693d09e50037b8b4206b72
Download the database by running
$ python -m pb_sed.database.desed.download -db /path/to/desed
yielding the following database structure
├── real
│ ├── audio
│ │ ├── eval
│ │ │ ├── eval_dcase2019
│ │ │ └── eval_dcase2020
│ │ ├── train
│ │ │ ├── unlabel_in_domain
│ │ │ └── weak
│ │ └── validation
│ │ └── validation
│ ├── dataset
│ │ ├── audio
│ │ │ └── eval
│ │ └── metadata
│ │ └── eval
│ ├── metadata
│ │ ├── eval
│ │ ├── train
│ │ └── validation
│ └── missing_files
├── rir_data
│ ├── eval
│ ├── train
│ └── validation
└── synthetic
├── audio
│ ├── eval
│ │ └── soundbank
│ └── train
│ ├── soundbank
│ └── synthetic20
├── dcase2019
│ └── dataset
│ ├── audio
│ └── metadata
└── metadata
└── train
└── synthetic20
Create json file (describing the database)
$ python -m pb_sed.database.desed.create_json -db /path/to/desed
This repository provides the source code for the 3-rd place solution presented by Paderborn University for the DCASE 2020 Challenge Task 4: Sound event detection and separation in domestic environments. Our submitted system achieved 48.3% and 47.2% event-based F1-score on the validation and evaluation sets, respectively. Later improvements led to 52.8% (+-0.6%) event-based F1-score on the validation set outperforming the winner of the challenge by 2.2% in average (comparison on evaluation set not possible as ground truth evaluation labels are not public).
For details see our paper "Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-Supervised Sound Event Detection" [pdf]. If you are using this code please cite our paper as follows:
@inproceedings{Ebbers2020,
author = "Ebbers, Janek and Haeb-Umbach, Reinhold",
title = "Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-Supervised Sound Event Detection",
booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)",
address = "Tokyo, Japan",
month = "November",
year = "2020",
pages = "41--45"
}
To train a FBCRNN on only weakly labeled and synthetic data, run
$ python -m pb_sed.experiments.dcase_2020_task_4.train_crnn
The prepared DESED database already includes the following weakly pseudo labeled datasets.
Pseudo weak labels used in our paper, i.e. generated by five different FBCRNN ensembles, which were only trained on weakly labeled and synthetic data:
- unlabel_in_domain_pseudo_weak_2020-07-03-20-48-45
- unlabel_in_domain_pseudo_weak_2020-07-03-20-49-48
- unlabel_in_domain_pseudo_weak_2020-07-03-20-52-19
- unlabel_in_domain_pseudo_weak_2020-07-03-21-00-48
- unlabel_in_domain_pseudo_weak_2020-07-03-21-05-34
Pseudo weak labels generated by the final FBCRNN ensembles from our paper, i.e. by five different FBCRNN ensembles trained on weakly labeled, synthetic and one of the weakly pseudo labeled data from above:
- unlabel_in_domain_pseudo_weak_2020-07-04-13-10-05
- unlabel_in_domain_pseudo_weak_2020-07-04-13-10-19
- unlabel_in_domain_pseudo_weak_2020-07-04-13-10-33
- unlabel_in_domain_pseudo_weak_2020-07-04-13-11-09
- unlabel_in_domain_pseudo_weak_2020-07-04-13-12-06
To train an FBCRNN leveraging, e.g., unlabel_in_domain_pseudo_weak_2020-07-04-13-10-05, run
$ python -m pb_sed.experiments.dcase_2020_task_4.train_crnn with 'unlabel_in_domain_pseudo_weak_timestamp=2020-07-04-13-10-05'
Each training stores checkpoints and metadata (incl. a tensorboard event file)
in a directory /path/to/storage_root/dcase_2020_crnn/<timestamp>
.
To train a tag-conditioned CNN pseudo strong labels are required. The prepared DESED database already includes the following strongly pseudo labeled datasets.
Pseudo strong labels used in our paper, i.e. generated by five different FBCRNN ensembles which were trained on weakly labeled, synthetic and one of the weakly pseudo labeled unlabeled data from above:
- weak_pseudo_strong_2020-07-04-13-10-05_best_frame_f1_crnn
- weak_pseudo_strong_2020-07-04-13-10-19_best_frame_f1_crnn
- weak_pseudo_strong_2020-07-04-13-10-33_best_frame_f1_crnn
- weak_pseudo_strong_2020-07-04-13-11-09_best_frame_f1_crnn
- weak_pseudo_strong_2020-07-04-13-12-06_best_frame_f1_crnn
- unlabel_in_domain_pseudo_strong_2020-07-04-13-10-05_best_frame_f1_crnn
- unlabel_in_domain_pseudo_strong_2020-07-04-13-10-19_best_frame_f1_crnn
- unlabel_in_domain_pseudo_strong_2020-07-04-13-10-33_best_frame_f1_crnn
- unlabel_in_domain_pseudo_strong_2020-07-04-13-11-09_best_frame_f1_crnn
- unlabel_in_domain_pseudo_strong_2020-07-04-13-12-06_best_frame_f1_crnn
Pseudo strong labels generated by five different Hybrid ensembles (4 FBCRNN + 4 tag-conditioned CNNs) from our paper, where CNNs were trained on a pair of strongly pseudo labeled weak and unlabeled data from above plus synthetic data:
- weak_pseudo_strong_2020-07-05-12-37-18_best_frame_f1_hybrid
- weak_pseudo_strong_2020-07-05-12-37-26_best_frame_f1_hybrid
- weak_pseudo_strong_2020-07-05-12-37-35_best_frame_f1_hybrid
- weak_pseudo_strong_2020-07-05-12-37-45_best_frame_f1_hybrid
- weak_pseudo_strong_2020-07-05-12-37-54_best_frame_f1_hybrid
- unlabel_in_domain_pseudo_strong_2020-07-05-12-37-18_best_frame_f1_hybrid
- unlabel_in_domain_pseudo_strong_2020-07-05-12-37-26_best_frame_f1_hybrid
- unlabel_in_domain_pseudo_strong_2020-07-05-12-37-35_best_frame_f1_hybrid
- unlabel_in_domain_pseudo_strong_2020-07-05-12-37-45_best_frame_f1_hybrid
- unlabel_in_domain_pseudo_strong_2020-07-05-12-37-54_best_frame_f1_hybrid
To train a tag conditioned CNN run, e.g.,
$ python -m pb_sed.experiments.dcase_2020_task_4.train_cnn with 'pseudo_strong_suffix=2020-07-05-12-37-18_best_frame_f1_hybrid'
Each training stores checkpoints and metadata (incl. a tensorboard event file)
in a directory /path/to/storage_root/dcase_2020_cnn/<timestamp>
.
To tune hyper-parameters, namely, decision thresholds, median-filter sizes and context length for FBCRNN-based SED run
$ python -m pb_sed.experiments.dcase_2020_task_4.tune_hyper_params with 'crnn_dirs=["/path/to/storage_root/dcase_2020_crnn/<timestamp_crnn_1>","/path/to/storage_root/dcase_2020_crnn/<timestamp_crnn_2>",...]' 'cnn_dirs=["/path/to/storage_root/dcase_2020_cnn/<timestamp_cnn_1>","/path/to/storage_root/dcase_2020_cnn/<timestamp_cnn_2>",...]'
Hyper parameters are stored in an output directory
/path/to/storage_root/dcase_2020_hyper_params/<timestamp>
.
To perform evaluation run
$ python -m pb_sed.experiments.dcase_2020_task_4.run_inference with 'hyper_params_dir=/path/to/storage_root/dcase_2020_hyper_params/<timestamp>' 'dataset_names=["validation", "eval_dcase2019"]' 'reference_files=["/path/to/desed/real/metadata/validation/validation.tsv", "/path/to/desed/real/metadata/eval/eval_dcase2019.tsv"]'
To perform inference and write prediction files (pseudo labels) for other data sets, run
$ python -m pb_sed.experiments.dcase_2020_task_4.run_inference with 'hyper_params_dir=/path/to/storage_root/dcase_2020_hyper_params/<timestamp>' 'dataset_names=["weak", "unlabel_in_domain", "eval_dcase2020"]'
For all-in-one inference+evaluation, run:
$ python -m pb_sed.experiments.dcase_2020_task_4.run_inference with 'hyper_params_dir=/path/to/storage_root/dcase_2020_hyper_params/<timestamp>' 'dataset_names=["validation", "eval_dcase2019", "weak", "unlabel_in_domain", "eval_dcase2020"]' 'reference_files=["/path/to/desed/real/metadata/validation/validation.tsv", "/path/to/desed/real/metadata/eval/eval_dcase2019.tsv", None, None, None]'
Predictions are stored as tsv-files in an output directory
/path/to/storage_root/dcase_2020_inference/<timestamp>
.
To add a custom pseudo labeled data set to the DESED database copy an event
file to the database's metadata and rerun create_json, e.g.:
$ cp /path/to/storage_root/dcase_2020_inference/<timestamp>/unlabel_in_domain_<timestamp>_best_frame_f1_hybrid.tsv /path/to/desed/real/metadata/train/
$ python -m pb_sed.database.desed.create_json -db /path/to/desed