AudioMask

'AudioMask: Robust Sound Event Detection Using Mask R-CNN and Frame-Level Classifier' https://ieeexplore.ieee.org/document/8995448

Creating Data:

1. Create the desired dataset with desired event probability from TUT dataset
2. Create mel-spectrograms of audio files with their masks for Mask R-CNN

Mask R-CNN:

1. Either train Mask R-CNN model for the specific event using a training set of the event
2. Run Mask R-CNN on the test set and produce all of the regions with event presence probability above 0.5

(Using process_file_for_evaluation.py) Read the Mask R-CNN report and sort it

Frame-level Classifier:

1. Either train frame-level classifier with segments of the data from the audio files
2. Convert list of regions proposed by Mask R-CNN to segments
3. Run frame-level classifier on these segments
4. Choose the true segments based on the probability produced by the classifier and confidence of the Mask R-CNN

Calculate the F1-score and ER

About

Implementation of the paper 'AudioMask: Robust Sound Event Detection Using Mask R-CNN and Frame-Level Classifier'

audio eventdetection

Languages

Language:Jupyter Notebook 98.2%Language:Python 1.8%