Domestic environment sound event detection (DESED) dataset utilities. Mix of recorded and synthetic data (used in DCASE task 4 since 2019).
If you use this dataset, do not hesitate to update the list of papers below with your paper by doing a pull request. If you use and like this work, you can cite it 😊
- Website: https://project.inria.fr/desed/
- Zenodo datasets: DESED_synthetic, DESED_public_eval
- Papers:
- Turpault et al. Description of DESED dataset + official results of DCASE 2019 task 4.
- Serizel et al. Robustness of DCASE 2019 systems on synthetic evaluation set.
Table of contents
- Installation
- Usage
- Short description
- Long description
- List of papers/code using DESED
- FAQ
- Important updates
- Citing us
- References
Take into account your changes of the code in the desed/
folder.
git clone https://github.com/turpaultn/DESED
cd DESED
pip install -e .
In this case, all your changes in desed
folder will be taken into account
Copying code from synthetic/code/
or real/code/
folders without changing desed/
content
pip install desed
import desed
desed.download_real("./data/dataset")
desed.download_desed_soundbank("./data/soundbank")
# Additional sets:
desed.download_fuss("./data/FUSS")
desed.download_fsd50k("./data/fsd50k", gtruth_only=True) # groundtruth only to use annotations for FUSS
- See examples
There are 3 different datasets:
- Recorded soundscapes (a.k.a. real).
- Soundbank to generate synthetic soundscapes.
- Public evaluation (recorded soundscapes) (a.k.a., Youtube in DCASE19, Vimeo is not available): DESED public eval
DESED dataset is for now composed of 10 event classes in domestic environment.
- Use only the real dataset.
- Use the soundbank to create your own synthetic soundscapes. (generate new mixtures using Scaper [1])
- Reproduce the soundscapes made for DCASE task 4.
There are 3 different datasets:
- Recorded soundscapes (a.k.a., real).
- Synthetic soundbank + DCASE task 4 soundscapes: DESED_synthetic
- Public evaluation (recorded soundscapes) (a.k.a., Youtube in DCASE19, Vimeo is not available): DESED public eval
All these datasets contain an "audio" folder associated with a "metadata" folder so they can all be grouped together by merging them
DESED dataset is for now composed of 10 event classes in domestic environment. The soundbank can include annotated data outside of the 10 classes to allow the creation of more realistic soundscapes.
Overview:
-
Recorded soundscapes:
- Verified and unverfied subset of Audioset.
- Unlabel_in_domain data: Unverified data have their label discarded: 14412 files.
- Weakly labeled data: training data have their labels verified at the clip level: 1578 files.
- Validation data have their labels with time boundaries (strong labels): 1168 files.
- Evaluation public files: Youtube files 692 files
- Verified and unverfied subset of Audioset.
-
Soundbank:
- Background files are extracted from SINS [2], TUT[7], MUSAN [3] or Youtube and have been selected because they contain a very low amount of our sound event classes.
- Foreground files are extracted from Freesound [4][5] and manually verified to check the quality and segmented to remove silences.
- Mixtures are described in Generating new synthetic soundscapes below.
- Soundbank:
- Training: 2060 background files (SINS) and 1009 foreground files (Freesound).
- Eval: 12 (Freesound) + 5 (Youtube) background files and 314 foreground files (Freesound).
-
DCASE 2019
- It uses synthetic soundbank, recorded soundscapes, and public evaluation data (a.k.a., Youtube eval during DCASE19).
- If you want more information about DCASE19 dataset visit DCASE 2019 task 4 web page
- If you only want to download DCASE19 files, go to dcase2019 task 4.
- Why don't we have a single dataset repository ?
The synthetic sounbank or recorded soundscapes can be used independently for different purposes. For example, one can create new synthetic soundscapes and evaluate his system on synthetic data only to focus on a specific problem.
- Why audio is not always included in the repository ?
Because of licenses issues. (Example of SINS in the training soundbank) We do not have the problem for evaluation data because we try to overcome the problem after running into this issue.
- I have a problem downloading the recorded soundscapes. How do I do ?
If you're in a country with youtube restrictions, you can try to use a VPN and the --proxy option from youtube-dl.
You can also try to upgrade youtube-dl since it is regularly updated.
Finally, if you succeeded to download most of the files, you can send your missing files (missing_files_XXX.tsv
)
by mail to Francesca Ronchini, Romain Serizel
and/or Nicolas Turpault.
- How do I evaluate and compare my system with other methods using this dataset ?
In this paper you can refer to the column 'Youtube' and for further study, you can cite the DESED public evaluation set.
Feel free to add your paper in the file list_papers_using_desed.md if you use the dataset and have a result on the public evaluation set:
Paper | Code |
---|---|
Turpault et al., DCASE workshop 2019. | https://github.com/turpaultn/DCASE2019_task4 |
Serizel et al., ICASSP 2020 | https://github.com/turpaultn/DESED |
Turpault et al., ICASSP 2020 | https://github.com/turpaultn/walle |
Turpault et al., preprint | https://github.com/turpaultn/dcase20_task4/tree/papers_code |
Turpault et al., preprint | https://github.com/turpaultn/dcase20_task4/tree/papers_code |
Note: to add it to README.md before doing the pull request, run python generate_table.py
- 26th February 2020, v1.2.5, refactor, get rid of bash files and ease the download through the package.
- 7th December 2020, v1.2.2, ease the download of soundbank (with or without pre-split validation)
- 23th April 2020, v1.2.0, update the generation procedure (
add_fg_event_non_noff
) to use all parts of files longer than the duration of the soundscapes created + Add possibility to use only background from certain labels (i.e: sins or tut). - 18th March 2020, v.1.1.7, update DESED_synth_dcase20_train_jams.tar of DESED_synth. They use pitch shifting, while the others didn't. These are the final JAMS used for dcase2020 baseline. Also commenting reverb since not used for the baseline.
The python code is publicly available under the MIT license, see the LICENSE file. The matlab code is taken from the Audio degradation toolbox [6], see the LICENSE file.
The different datasets contain a license file at their root for the attribution of each file.
The different platform used are: Freesound [4][5], Youtube, MUSAN [3] and SINS [2].
Using this repository and happy to give attribution ? Here is how to cite us:
-
N. Turpault, R. Serizel, A. Parag Shah, J. Salamon. Sound event detection indomestic environments with weakly labeled data and soundscape synthesis. Workshop on Detectionand Classification of Acoustic Scenes and Events, Oct 2019, New York City, USA.
-
R. Serizel, N. Turpault, A. Shah, J. Salamon. Sound event detection in synthetic domestic environments. ICASSP, May 2020, Barcelona, Spain.
[1] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello. Scaper: A library for soundscape synthesis and augmentation In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.
[2] Gert Dekkers, Steven Lauwereins, Bart Thoen, Mulu Weldegebreal Adhana, Henk Brouckxon, Toon van Waterschoot, Bart Vanrumste, Marian Verhelst, and Peter Karsmakers. The SINS database for detection of daily activities in a home environment using an acoustic sensor network. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), 32–36. November 2017.
[3] David Snyder and Guoguo Chen and Daniel Povey. MUSAN: A Music, Speech, and Noise Corpus. arXiv, 1510.08484, 2015.
[4] F. Font, G. Roma & X. Serra. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.
[5] E. Fonseca, J. Pons, X. Favory, F. Font, D. Bogdanov, A. Ferraro, S. Oramas, A. Porter & X. Serra. Freesound Datasets: A Platform for the Creation of Open Audio Datasets. In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.
[6] M. Mauch and S. Ewert, “The Audio Degradation Toolbox and its Application to Robustness Evaluation”. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), Curitiba, Brazil, 2013.
[7] A. Mesaros and T. Heittola, T. Virtanen, “TUT database for acoustic scene classification and sound event detection”. In Proceedings of the 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 2016.