Group 1 in Pierre Baldi's Spring 2020 offering of CS 172B: Neural Networks and Deep Learning.
- Taneisha Arora (arorat@uci.edu)
- Thanasi Bakis (abakis@uci.edu)
- Theja Krishna (takrishn@uci.edu)
- Bryon Tjanaka (btjanaka@uci.edu)
We seek to explore the performance difference from using different audio features to detect bird sounds from 10-second audio recordings, as described in the DCASE 2018 Challenge. Please see our report for more info.
Install the dependencies:
pip install -r requirements.txt
And you should be good to go.
The main interface is in the hhh.detector
script. Run
python -m hhh.detector -h
for full help info.
Given a dataset with labels in LABELS.csv
and WAV audio files in WAV_DIR
,
train a detector with all available audio features using:
python -m hhh.detector \
--labels-csv-path LABELS.csv \
--wav-dir WAV_DIR \
--features-npy-path FEATURES.npy \
--labels-npy-path LABELS.npy \
--dsp all
FEATURES.npy
and LABELS.npy
will store computed features and labels of the
dataset, to avoid re-computing in the future. Next time the script is run, pass
in the --use-saved-features
flag to use these cached features and labels.
The above command will also save the model to detector.pth
. This filepath can
be changed with the --model-save-path
flag. To load a model, use the
--model-load-path
flag. Note that if you load a model, it will not be trained
further by default. To continue training, pass the --continue-training
flag.
Metrics are logged to the directory indicated in the --tensorboard-dir
flag
(tensorboard-logs/
by default). To view these metrics, run
tensorboard --logdir <DIR>
And go to http://localhost:6006 in your browser.
You can modify several of the training hyperparameters, including training
epochs (--epochs
) and batch size (--batch-size
).
Pass the --force-cpu
flag to force training on a CPU.
To run our experiments, first create a new directory by running the command
mkdir -p data/bird-audio-detection
from the root directory.
Next, navigate to http://dcase.community/challenge2018/task-bird-audio-detection and download all three datasets listed under Development Datasets.
Rename and relocate the data labels as follows:
mv ff1010bird_metadata_2018.csv data/bird-audio-detection/ff1010-labels.csv
mv warblrb10k_public_metadata_2018.csv data/bird-audio-detection/warblr-labels.csv
mv BirdVoxDCASE20k_csvpublic.csv data/bird-audio-detection/birdvox-labels.csv
After unzipping, rename and relocate the audio files as follows:
mv ff1010bird_wav data/bird-audio-detection/ff1010-wav
mv warblrb10k_public_wav data/bird-audio-detection/warblr-wav
mv BirdVox-DCASE-20k data/bird-audio-detection/birdvox-wav
To perform some analysis on the data, open Jupyter Lab with:
jupyter lab
Within Jupyter Lab, open notebooks/exploratory-analysis.ipynb
. Run the cells
to perform exploratory analysis on the data.
Shell scripts containing a few of the run commands we used to train our final
models can be found in the scripts/
directory.
To execute, run the command bash scripts/run_birdvox.sh
from the root
directory to train a model on the BirdVox-DCASE-20k dataset. Scripts for the
Warblr10k dataset and the FreeField1010 dataset are also located in the same
directory.