Neural Seq

A python tool for developing music models.

The structure is designed around a language modeling approach to musical generation. The problem of language modeling is as follows:

[LANGUAGE MODELING DESCRIPTION]

Trained language models can then be used to generate coherent sequences of text given some input context (REFERENCE GPT3). Note that while the objective of training is to learn a probability distribution over textual tokens conditioned on some input text, various alternatives to maximum liklihood sampling of output tokens have been utilised that text of a higher quality as judged by human evaluators.

Although language models may exploit prior beliefs about the structure of natural language in order to achieve stronger performance (whatever that may mean to the researcher), many of the techniques can be applied to arbitrary sequences. Indeed there has been recent progress adapting autoregressive RNN- and Transformer-based models that were originally conceived for the task of language modeling not only to musical sequences but also to seemingly remote problems such as time series forecasting with some success (REFERENCE ENHANCING LOCALITY..., DEEPAR...).

Approaches to musical generation frequently start with a corpus of MIDI files which are then subjected to some encoding scheme which converts the MIDI data into a sequence of simplified textual tokens. These encoded representations are then used by a language model for training and evaluation. In order to listen to the resulting outputs, the tokenised representation of the musical sequence generated by the language model must be decoded into MIDI format.

It is the hope of the author that users of Neural Seq, at least in the future, can spend their time focusing on the interesting parts of this process, namely coming up with novel encoding/decoding schemes for MIDI data and developing models, while having the tools at their disposal to iterate quickly and effortlessly sample music generated by their work.

Requirements

Note: The following assumes that you have git installed are running Python 3.7 on MacOS and have configured a virtual environment of some description. To set up a virtual env using venv see here.

First clone the repo into your local directory:

$ git clone https://github.com/aliroberts/neural-seq

Install Python dependencies:

$ cd neural-seq
$ pip install -r requirements.txt

Make the command line script executable for the current user:

$ chmod u+x nseq.sh

Getting started

Let's explore Neural Seq and create a neural network-powered drum machine. To do so we perform the following steps:

Download a collection of MIDI files from which our training data will be generated.
Extract the relevant parts from the MIDI files and convert them to some textual representation (encoding)
Train a model using the data
Generate textual sequences from the model and convert them to MIDI (decoding) for playback

Download MIDI data

Run the following command to download a collection of MIDI files corresponding to the songs of various pop/rock artists from the last few decades:

$ ./nseq.sh fetch-data

The files will be downloaded to a directory .data in the current one.

You can view a list of artists using the list-artists command:

$ ./nseq.sh list-artists --search phil\ col
Phil Collins

And list songs for a specified artist using the list-songs command:

$ ./nseq list-songs --artist Phil\ Collins
A Groovy Kind of Love.mid
Against All Odds.mid
Another Day in Paradise.1.mid
Another Day in Paradise.2.mid
Another Day in Paradise.mid
Don't Lose My Number.mid
Easy Lover.mid
I Wish It Would Rain Down.mid
In The Air Tonight.1.mid
In The Air Tonight.mid
No Son of Mine.mid
One More Night.mid
Sussudio.mid
True Colors.mid
You Can't Hurry Love.mid

If we pass the --path option to the list-songs command then we can view the relative path to MIDI files:

$ ./nseq list-songs --artist Phil\ Collins --path
.data/midi_data/clean_midi/Phil Collins/A Groovy Kind of Love.mid
.data/midi_data/clean_midi/Phil Collins/Against All Odds.mid
.data/midi_data/clean_midi/Phil Collins/Another Day in Paradise.1.mid
.data/midi_data/clean_midi/Phil Collins/Another Day in Paradise.2.mid
.data/midi_data/clean_midi/Phil Collins/Another Day in Paradise.mid
.data/midi_data/clean_midi/Phil Collins/Don't Lose My Number.mid
.data/midi_data/clean_midi/Phil Collins/Easy Lover.mid
.data/midi_data/clean_midi/Phil Collins/I Wish It Would Rain Down.mid
.data/midi_data/clean_midi/Phil Collins/In The Air Tonight.1.mid
.data/midi_data/clean_midi/Phil Collins/In The Air Tonight.mid
.data/midi_data/clean_midi/Phil Collins/No Son of Mine.mid
.data/midi_data/clean_midi/Phil Collins/One More Night.mid
.data/midi_data/clean_midi/Phil Collins/Sussudio.mid
.data/midi_data/clean_midi/Phil Collins/True Colors.mid
.data/midi_data/clean_midi/Phil Collins/You Can't Hurry Love.mid

We can listen to a specified MIDI file using the play-midi command:

$ ./nseq.sh play-midi .data/midi_data/clean_midi/Phil\ Collins/In\ The\ Air\ Tonight.mid

We can also filter out the part matching a specified instrument name (corresponding to General MIDI patch numbers/names for all instruments except drums which use the name 'Drum Kit') and listen to that part in isolation.

$ ./nseq.sh play-midi .data/midi_data/clean_midi/Phil\ Collins/In\ The\ Air\ Tonight.mid --filter drum

Extract and encode MIDI parts

Next, let's filter out the drum part from songs for a selection of artists and encode them using a specifed encoder. We can do this using the gen-dataset command that also splits up the resulting files into training, validation and test sets in the specified proportions.

First let's create a text file with the names of the artists whose songs we want to encode:

$ echo $'David Bowie\nPhil Collins\nDaft Punk\nMichael Jackson\nNew Order\nTalking Heads' > artists.txt

Next, let's create a dataset from the songs by the selected artists:

$ ./nseq.sh gen-dataset --artists artists.txt --dest drum_dataset --valid=0.2 --encoder simple_drum --filter drum

This command will create a drum_dataset/ directory in the current directory and create a training, validation and test set for us (70%, 15%, 15% split).

Note that in the dataset we downloaded there are some duplicate songs that will be ignored by the above command.

Encoders

The encoder argument can take the name of any file in the src/encoders/ directory. The supplied file will be searched and the first subclass of BaseEncoder that is found will be picked out.

The simple_drum encoder will convert the MIDI file (supplied to it as a PrettyMIDI object) into a list of 'action' and 'duration' tokens that specify the pitch (the drum in this case) that is played and the duration until the next set of pitches is played. This is essentially a polyphonic encoding scheme but since the expected instrument is percussive and the sounds are short-lived, all the lengths are fixed to a 16th-note duration.

For example:

import pretty_midi

# Load a PrettyMidi object for the specified file. This is the sort of object that an Encoder instance will receive.
midi_data = pretty_midi.PrettyMIDI('.data/midi_data/clean_midi/Phil Collins/In The Air Tonight.mid')

# Pick out the first 10 notes associated with the drum kit
drum_kit = midi_data.instruments[0]
drum_kit.notes[:8]

[Note(start=2.500000, end=2.519531, pitch=42, velocity=52),
 Note(start=3.125000, end=3.144531, pitch=42, velocity=62),
 Note(start=3.750000, end=3.769531, pitch=42, velocity=68),
 Note(start=4.375000, end=4.394531, pitch=42, velocity=66),
 Note(start=5.000000, end=5.019531, pitch=46, velocity=40),
 Note(start=5.312500, end=5.332031, pitch=64, velocity=30),
 Note(start=5.312500, end=5.332031, pitch=50, velocity=50),
 Note(start=5.312500, end=5.332031, pitch=42, velocity=54)]

The tempo of the song is 96 BPM (which can be extracted via the PrettyMidi object) which means each 16th note lasts (60 / 92 / 4) = 0.15625 seconds. Snapping the sequence to the nearest 16th note time value and encoding the resulting notes according to this scheme results in the token sequence

["D-16", "P-42", "D-4", "P-42", "D-4", "P-42", "D-4", "P-42", "D-4", "P-46", "D-2", "P-50", "P-42", "P-64", "D-4", "P-64", "P-42", "P-50", "D-4", "P-41", "P-35", "P-42", "D-2", "P-64", "P-46", "P-50", "D-2"]

Train a model

Now we have a dataset, we can train a model. We'll train an AWD LSTM (Merity et al. 2017) language model.

$ ./nseq.sh train --data-dir drum_dataset --model awd_lstm --dest drum_model --bptt 32 --epochs 10 --bs 64 --lr 0.3 --when 5,10

Models

Models can be found in the src/models directory

Generate music

TODO

aliroberts / neural-seq