zhangwq740 / pb_sed

Paderborn Sound Event Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pb_sed: Paderborn Sound Event Detection

Source code for DCASE 2021 Task 4 coming soon!

Installation

Install requirements:

$ pip install --user git+https://github.com/fgnt/lazy_dataset.git@d500d23d23c0cc2ebb874c4974b4ffa7a2418b96
$ pip install --user git+https://github.com/fgnt/paderbox.git@f0e7b0bf66a0ee6e5f51797305d84cf57227134d
$ pip install --user git+https://github.com/fgnt/padertorch.git@9985d398c10ec086e18f7525c7e7dc2809c1e7f3

Clone the repository:

$ git clone https://github.com/fgnt/pb_sed.git

Install package:

$ pip install --user -e pb_sed

Database

DESED (DCASE 2020 Task 4)

Install requirements:

$ pip install --user git+https://github.com/turpaultn/DESED@2fb7fe0b4b33569ad3693d09e50037b8b4206b72

Download the database by running

$ python -m pb_sed.database.desed.download -db /path/to/desed

yielding the following database structure

├── real
│   ├── audio
│   │   ├── eval
│   │   │   ├── eval_dcase2019
│   │   │   └── eval_dcase2020
│   │   ├── train
│   │   │   ├── unlabel_in_domain
│   │   │   └── weak
│   │   └── validation
│   │       └── validation
│   ├── dataset
│   │   ├── audio
│   │   │   └── eval
│   │   └── metadata
│   │       └── eval
│   ├── metadata
│   │   ├── eval
│   │   ├── train
│   │   └── validation
│   └── missing_files
├── rir_data
│   ├── eval
│   ├── train
│   └── validation
└── synthetic
    ├── audio
    │   ├── eval
    │   │   └── soundbank
    │   └── train
    │       ├── soundbank
    │       └── synthetic20
    ├── dcase2019
    │   └── dataset
    │       ├── audio
    │       └── metadata
    └── metadata
        └── train
            └── synthetic20

Create json file (describing the database)

$ python -m pb_sed.database.desed.create_json -db /path/to/desed

Experiments

DCASE 2020 Task 4

This repository provides the source code for the 3-rd place solution presented by Paderborn University for the DCASE 2020 Challenge Task 4: Sound event detection and separation in domestic environments. Our submitted system achieved 48.3% and 47.2% event-based F1-score on the validation and evaluation sets, respectively. Later improvements led to 52.8% (+-0.6%) event-based F1-score on the validation set outperforming the winner of the challenge by 2.2% in average (comparison on evaluation set not possible as ground truth evaluation labels are not public).

For details see our paper "Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-Supervised Sound Event Detection" [pdf]. If you are using this code please cite our paper as follows:

@inproceedings{Ebbers2020,
    author = "Ebbers, Janek and Haeb-Umbach, Reinhold",
    title = "Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-Supervised Sound Event Detection",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)",
    address = "Tokyo, Japan",
    month = "November",
    year = "2020",
    pages = "41--45"
}

FBCRNN

To train a FBCRNN on only weakly labeled and synthetic data, run

$ python -m pb_sed.experiments.dcase_2020_task_4.train_crnn

The prepared DESED database already includes the following weakly pseudo labeled datasets.

Pseudo weak labels used in our paper, i.e. generated by five different FBCRNN ensembles, which were only trained on weakly labeled and synthetic data:

  • unlabel_in_domain_pseudo_weak_2020-07-03-20-48-45
  • unlabel_in_domain_pseudo_weak_2020-07-03-20-49-48
  • unlabel_in_domain_pseudo_weak_2020-07-03-20-52-19
  • unlabel_in_domain_pseudo_weak_2020-07-03-21-00-48
  • unlabel_in_domain_pseudo_weak_2020-07-03-21-05-34

Pseudo weak labels generated by the final FBCRNN ensembles from our paper, i.e. by five different FBCRNN ensembles trained on weakly labeled, synthetic and one of the weakly pseudo labeled data from above:

  • unlabel_in_domain_pseudo_weak_2020-07-04-13-10-05
  • unlabel_in_domain_pseudo_weak_2020-07-04-13-10-19
  • unlabel_in_domain_pseudo_weak_2020-07-04-13-10-33
  • unlabel_in_domain_pseudo_weak_2020-07-04-13-11-09
  • unlabel_in_domain_pseudo_weak_2020-07-04-13-12-06

To train an FBCRNN leveraging, e.g., unlabel_in_domain_pseudo_weak_2020-07-04-13-10-05, run

$ python -m pb_sed.experiments.dcase_2020_task_4.train_crnn with 'unlabel_in_domain_pseudo_weak_timestamp=2020-07-04-13-10-05'

Each training stores checkpoints and metadata (incl. a tensorboard event file) in a directory /path/to/storage_root/dcase_2020_crnn/<timestamp>.

Tag-conditioned CNN

To train a tag-conditioned CNN pseudo strong labels are required. The prepared DESED database already includes the following strongly pseudo labeled datasets.

Pseudo strong labels used in our paper, i.e. generated by five different FBCRNN ensembles which were trained on weakly labeled, synthetic and one of the weakly pseudo labeled unlabeled data from above:

  • weak_pseudo_strong_2020-07-04-13-10-05_best_frame_f1_crnn
  • weak_pseudo_strong_2020-07-04-13-10-19_best_frame_f1_crnn
  • weak_pseudo_strong_2020-07-04-13-10-33_best_frame_f1_crnn
  • weak_pseudo_strong_2020-07-04-13-11-09_best_frame_f1_crnn
  • weak_pseudo_strong_2020-07-04-13-12-06_best_frame_f1_crnn
  • unlabel_in_domain_pseudo_strong_2020-07-04-13-10-05_best_frame_f1_crnn
  • unlabel_in_domain_pseudo_strong_2020-07-04-13-10-19_best_frame_f1_crnn
  • unlabel_in_domain_pseudo_strong_2020-07-04-13-10-33_best_frame_f1_crnn
  • unlabel_in_domain_pseudo_strong_2020-07-04-13-11-09_best_frame_f1_crnn
  • unlabel_in_domain_pseudo_strong_2020-07-04-13-12-06_best_frame_f1_crnn

Pseudo strong labels generated by five different Hybrid ensembles (4 FBCRNN + 4 tag-conditioned CNNs) from our paper, where CNNs were trained on a pair of strongly pseudo labeled weak and unlabeled data from above plus synthetic data:

  • weak_pseudo_strong_2020-07-05-12-37-18_best_frame_f1_hybrid
  • weak_pseudo_strong_2020-07-05-12-37-26_best_frame_f1_hybrid
  • weak_pseudo_strong_2020-07-05-12-37-35_best_frame_f1_hybrid
  • weak_pseudo_strong_2020-07-05-12-37-45_best_frame_f1_hybrid
  • weak_pseudo_strong_2020-07-05-12-37-54_best_frame_f1_hybrid
  • unlabel_in_domain_pseudo_strong_2020-07-05-12-37-18_best_frame_f1_hybrid
  • unlabel_in_domain_pseudo_strong_2020-07-05-12-37-26_best_frame_f1_hybrid
  • unlabel_in_domain_pseudo_strong_2020-07-05-12-37-35_best_frame_f1_hybrid
  • unlabel_in_domain_pseudo_strong_2020-07-05-12-37-45_best_frame_f1_hybrid
  • unlabel_in_domain_pseudo_strong_2020-07-05-12-37-54_best_frame_f1_hybrid

To train a tag conditioned CNN run, e.g.,

$ python -m pb_sed.experiments.dcase_2020_task_4.train_cnn with 'pseudo_strong_suffix=2020-07-05-12-37-18_best_frame_f1_hybrid'

Each training stores checkpoints and metadata (incl. a tensorboard event file) in a directory /path/to/storage_root/dcase_2020_cnn/<timestamp>.

Hyper parameter tuning

To tune hyper-parameters, namely, decision thresholds, median-filter sizes and context length for FBCRNN-based SED run

$ python -m pb_sed.experiments.dcase_2020_task_4.tune_hyper_params with 'crnn_dirs=["/path/to/storage_root/dcase_2020_crnn/<timestamp_crnn_1>","/path/to/storage_root/dcase_2020_crnn/<timestamp_crnn_2>",...]' 'cnn_dirs=["/path/to/storage_root/dcase_2020_cnn/<timestamp_cnn_1>","/path/to/storage_root/dcase_2020_cnn/<timestamp_cnn_2>",...]'

Hyper parameters are stored in an output directory /path/to/storage_root/dcase_2020_hyper_params/<timestamp>.

Evaluation / Inference

To perform evaluation run

$ python -m pb_sed.experiments.dcase_2020_task_4.run_inference with 'hyper_params_dir=/path/to/storage_root/dcase_2020_hyper_params/<timestamp>' 'dataset_names=["validation", "eval_dcase2019"]' 'reference_files=["/path/to/desed/real/metadata/validation/validation.tsv", "/path/to/desed/real/metadata/eval/eval_dcase2019.tsv"]'

To perform inference and write prediction files (pseudo labels) for other data sets, run

$ python -m pb_sed.experiments.dcase_2020_task_4.run_inference with 'hyper_params_dir=/path/to/storage_root/dcase_2020_hyper_params/<timestamp>' 'dataset_names=["weak", "unlabel_in_domain", "eval_dcase2020"]'

For all-in-one inference+evaluation, run:

$ python -m pb_sed.experiments.dcase_2020_task_4.run_inference with 'hyper_params_dir=/path/to/storage_root/dcase_2020_hyper_params/<timestamp>' 'dataset_names=["validation", "eval_dcase2019", "weak", "unlabel_in_domain", "eval_dcase2020"]' 'reference_files=["/path/to/desed/real/metadata/validation/validation.tsv", "/path/to/desed/real/metadata/eval/eval_dcase2019.tsv", None, None, None]'

Predictions are stored as tsv-files in an output directory /path/to/storage_root/dcase_2020_inference/<timestamp>. To add a custom pseudo labeled data set to the DESED database copy an event file to the database's metadata and rerun create_json, e.g.:

$ cp /path/to/storage_root/dcase_2020_inference/<timestamp>/unlabel_in_domain_<timestamp>_best_frame_f1_hybrid.tsv /path/to/desed/real/metadata/train/
$ python -m pb_sed.database.desed.create_json -db /path/to/desed

About

Paderborn Sound Event Detection

License:MIT License


Languages

Language:Python 100.0%