convolutional-neural-networks irmas music-information-retrieval tensorflow

irmas-cnn

Experiments on the IRMAS dataset using Convolutional Neural Networks.

This repository contains code and Jupyter Notebooks of my attempts on the IRMAS dataset.

The IRMAS dataset [link] is used for musical instrument recognition in audio tracks. It consists of:

The trainset: Contains 3-second tracks of solo instruments. There are 11 instruments
The testset: A collection of multi-instrumental audio tracks. Each track is labeled with at least one of the instruments, which is considered its "dominant" sound.

In this project, I have extracted different features from the audio signal, which were fed to a Convolutional Neural Network. The two networks included in the project:

VGG-16 [paper]
A variation of the YOLO architecture [paper]

All of the models are implemented using the new higher level Tensorflow API [link].

There are Jupyter Notebooks for two experiments:

Using Mel-Frequency Cepstrum as feature with YOLO-like CNN here
Using several handpicked features with a VGG-16 architecture here

Using the DatasetPreprocessor

The repository includes the DatasetPreprocessor class that extracts features from the raw files of the IRMAS dataset and stores them in easy to use .h5 files. All features are generated using Librosa [link].

Initialize the DatasetPreprocessor object like this:

dp = DatasetPreprocessor('mel')

or

dp = DatasetPreprocessor('handpicked')

The mel option extract the Mel-Frequency Cepstrum as feature The handpicked option extracts the following features:

Spectral Centroid
Spectral Bandwidth
Spectral Rolloff
Zero-crossing rate
RMSE
MFCC

To generate the train and test sets call

dp.generateTrain('path/to/trainset/folder')
dp.generateTest('path/to/testset/folder')

Use

dp.normalizeGain('path/to/trainset/folder')

to normalize the gain of all tracks in a folder, to a specific dB value.

About

Experiments on musical instrument recognition with the IRMAS dataset and Convolutional Neural Networks

convolutional-neural-networks irmas music-information-retrieval tensorflow

Languages

Language:Jupyter Notebook 65.1%Language:Python 34.9%