Sound event detection system submitted to DCASE 2016 (detection and classification of acoustic scenes and events) challenge.
Convolutional neural network is used for detecting and classifying polyphonic events in a long temporal context of filter bank acoustic features. Training data are augmented via sox speed perturbation.
On development data set the system achieves 0.84% segment error rate (7.7% relative imporment compared to baseline) 36.3% F-measure (55.1 relative better than baseline system).
Technical details are descibed in the challenge report. Detailed results summary on development and evaluation audios are also available:
run-cnn-pipeline.sh - complete self-documented script for reproducing all the experiments including the following:
-
task3_gmm_baseline.py - baseline GMM system provided by organizers.
-
src/make_downsample.sh - basic data preparation (down sampling)
-
task3_cnn.py - run CNN based system training and testing
-
src/make_speed.sh - speed perturbation