ranchlai / sound_classification

pretrained sound classification models on esc-50. acc=0.937

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sound classification

This example demonstrates the use of pretrained models trained on Audioset to fine-tune a smaller sound classification dataset, namely esc50 dataset.

The tricks and tips are as follows:

  • Use pre-trained model Cnn14 from audioset_tagging_cnn[1]. The model is pre-trained on audioset, which is the largest weakly-labelled(only class info, without exact event time location) sound event dataset.
  • No weight-decaying is used
  • In training, spectrogram is of 384 frames. Random cropping is used.
  • In test/eval, all 501 frames are used (so the result is also deterministic.
  • Same lr decreasing scheduler as in baseline.
  • Use large dropout,

Testing

First you need to download the ESC50 dataset following this link. Then run the following to test on ESC50 dataset.

python test.py -a <audio_folder> -m <meta_file> -d gpu

Results

Without any tricks, this example achieved average acc 0.937 across 5 folds, ranking No. 3 (as of 2021-08) in the leader board.

Training

TBD

Requirements

# install paddleaudio
git clone https://github.com/PaddlePaddle/models.git
cd models/PaddleAudio
pip install -e .
git clone https://github.com/ranchlai/sound_classification.git
cd sound_classification
pip install -r requirements.txt

References

[1] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley. "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition." arXiv preprint arXiv:1912.10211 (2019).

[2] Urban Sound Tagging using Multi-Channel Audio Feature with Convolutional Neural Networks

About

pretrained sound classification models on esc-50. acc=0.937


Languages

Language:Python 100.0%