Challenge info: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/
The task is to design a system that, given a short audio recording, returns a binary decision for the presence/absence of bird sound (bird sound of any kind).
Pretty straightforward implementation of convolutional detectors for this task, nothing too fancy. Data augmentation done with:
- noise corruption of training files,
- averaging of two training spectrograms at random,
- time-shifts of whole recordings,
- permutations of spectrogram segments.
Final submission consists of averaged predictions from 6 nets:
High resolution spectrogram convnet:
- trained on whole dataset,
- 5 conv/pooling LeakyReLU layers with batch normalization and delta channel,
- global max pooling before output.
Same as run-A
, different random seed.
Same as run-A
, but low resolution (downsampled).
Same as run-A_ds
, different random seed.
Low resolution spectrogram convnet.
When compared to runs A_ds
/B_ds
:
- less filters,
- no L2 regularization,
- more epochs,
- faster learning rate decay,
- less dropout.
Originally used shuffle_spec() with 5 segments (spec length of 475) and less noise augmentations to generate the predictions.
High resolution spectrogram convnet.
When compared to runs A
/B
:
- more filters,
- lower regularization,
- lower learning rate,
- more epochs.