RNN composer help

Here I share some scripts I used to help me on training char-rnn to learn to compose tunes in ABC notation.

I keep a blog called RNN music of the day where each day I posted one tune generated by this method.

I don't share my dataset as I'm not sure about possible copyright issues.

Pre-requisites

You'll need to install Torch and the other requirements from char-rnn. Just follow the instructions.

I rely heavily on system calls to get the job done, so you'll need to install some tools on your machine for these scripts to work:

Some python modules that can be installed by:

pip install beautifulsoup4 requests

Beautiful Soup is used by a crawler I wrote to get some more data.

We also need a dataset (obviously) and some soundfounts to convert the generated music to MP3.

Let's see we can get files to train the network.

Getting files to train the network

I searched the web with preference to huge ABC collecions. The most important files I found are these:

Combined_Tunebook.abc
HAN1.abc and HAN2.abc (Chinese tunes)
nyfte.abc
All collections from http://www.lesession.co.uk/music/
All collections from http://www.norbeck.nu/abc/
All collections from http://abc.sourceforge.net/NMD/
The entire Hanny Christen collection from https://www.dropbox.com/sh/flddgymslo57zvl/AAAz5s386GT0d4amGpHb8wDXa
Some random books from http://www.campin.me.uk/
Some random collections from http://www.cranfordpub.com/tunes/abcs/abc_tunes.htm
Most files with than 50 tunes from http://www.paulhardy.net/
Books 1 to 46 from http://trillian.mit.edu/~jc/music/book/SCD/
Tunes from a Dropbox user: use the get_u_4496965_data.py script. These are some nice tunes from

From the above links, I must emphasize the Hanny Christen collection of Swiss folk music for two reasons: it's huge with over 10000 tunes (about 4.4 MB after I preprocessed it) and all it's musics have chords, giving rise to richer compositions. I have indeed trained the network on this data alone and got great results.

If I forgot something, please open an issue and I'll update this.

My workflow

I'll just give an overview here. Each script has some helper, just pass the flag -h to them.

Use preprocess.py to concatenate and shuffle all files into one big file.
Train char-rnn on that file.
Sample trained network to generate a big file with compositons (using seqlen=100000 tipically).
Use file_to_midi_and_abc.py to convert that big file into many MIDI files and ABC files.
Use abc_to_mp3.py to convert the MIDI files into MP3 files.
Use abc_to_svg.py to convert the ABC files into SVG sheets.

Step 5 needs one more clarification: I used sounfount 754-Donnys Guitar to synthesize better sounding files.

Tips on training `char-rnn`

Be careful to not overfit. Specially when using small datasets. One time I found marvelous results on a 200 KB dataset just to discover that some tunes were plagiarism of the training set. One of the indications of overfitting is to have evaluation loss much larger than training loss. In that case, you should use less memory units or get a larger dataset.

allanino / rnn-composer-helper

RNN composer help

Pre-requisites

Getting files to train the network

My workflow

Tips on training `char-rnn`

About

Languages

RNN composer help

Pre-requisites

Getting files to train the network

My workflow

Tips on training char-rnn

About

Languages

Tips on training `char-rnn`