General information

Current repository contains complete Chords Recognition system. System takes audiofile as input and returns timings for each chord according to Mirex categories (See example bellow). System implementation is based on Recurrent neural net with LSTM cell.

In addition, repo contains realtime demo and simple web-app, which allows user to apload song and get chords for it. Demo video of realtime chords recognition is avaliable here
Project implemented as a part of Master thesis, full text avaliable here.

Datasets

TheBeatles180 Isophonics dataset
- audio - extract in data/audio
- gt - extract in data
- converted - extract in data/converted
- converted with librosa
- converted_list - extract to root folder
CaroleKingQueen Isophonic dataset
USPop2002
JayChou29
- audio
Full datasets

Structure of repo:

data
- audio - default folder for raw audio files
- converted - default folder for the saving preprocessed data in csv format
- gt - contains .lab files with chords
- tracklists - contains lists of paths to audio files starting with 'audio_root' parameter
- predicted - contains predictions for TheBeatles180 and JayChou29 datasets, reports generated by MusOOEvaluator and pretrained models

Installation

Download raw and converted datasets

bash ./data/download.sh

Preprocessing

System computes notegramms (252 bins per sample) as described in Mauch 2010 (p.98) with hop_length=512, sample_rate=11025, window_size=4096

How to use:

python preprocess.py --songs_list data/tracklists/TheBeatles180List

Parameters:

--songs_list
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determines folder where converted datasets will be saved
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--use_librosa, default: True, by default librosa.cqt will be used for preprocessing
--songs_list, required, examples could be found in data/tracklists
--num_bins, default: 84, defines number of bins during cqt
--modulation_steps, list, default: [0], defines amount of modulation steps during CQT-preprocessing

Models:

LSTM

How to use:

python train_rnn.py --model LSTM --conv_list TheBeatles180List_converted_librosa.txt

Optional parameters:

--num_epochs, default: 2
--learning_rate, default: 0.01
--weight_decay, default: 1e-5
--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--hidden_dim, default: 200
--num_layers, default: 2
--batch_size, default: 4
--sch_step_size, default:100, scheduler step size
--sch_gamma, default:100, scheduler's gamma
--, default:10, test model on train and val datasets every n iterations
--use_librosa, default: True
--save_model_as, if specified, model will be saved in pretrained folder

Test

python test_nn.py --model pretrained/LSTM_MirexRoot_TheBeatles180_librosa.pkl --conv_root data/converted/librosa --conv_list TheBeatles180List_converted_librosa.txt

Results

Isophonic 2009

Model	MirexRoot	MirexMajMin	MirexMajMinBass	MirexSevenths	MirexSeventhsBass
BiLSTM	95.58%	94.59%	94.27%	90.86%	90.6%
BiLSTM with modulation	95.31%	93.43%	94.98%	91.67%	91.25%
BiGRU with modulation	93.69%	91.30%	90.75%	89.46%	87.43%

JayChou

Model	MirexRoot	MirexMajMin	MirexMajMinBass	MirexSevenths	MirexSeventhsBass
BiLSTM	67.7%	61.42%	59%	41%	39.33%
BiLSTM with modulation	72.04%	69.58%	66.09%	48.97%	44.98%
BiGRU with modulation	68.59%	65.27%	62.59%	45.08%	40.37%

Random forest

Accuracy on test-set: Mirex_Root:55%

How to use:

Train

python train_rf.py

Optional parameters:

--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--criterion, default: entropy
--max_features, default: log2
--n_estimators, default: 1

Test

python test_rf.py --model pretrained/RF_MirexRoot_TheBeatles180_librosa.pkl --conv_root data/converted/librosa --conv_list TheBeatles180List_converted_librosa.txt

Optional parameters:

--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len

Applications

Realtime demo

python realtime.py --category MirexRoot

Web-application for chords recognition

python web_app\server.py

About

Languages

Language:Python 89.3%Language:Jupyter Notebook 7.8%Language:Shell 2.9%