A python tool for developing music models.
The structure is designed around a language modeling approach to musical generation. The problem of language modeling is as follows:
[LANGUAGE MODELING DESCRIPTION]
Trained language models can then be used to generate coherent sequences of text given some input context (REFERENCE GPT3). Note that while the objective of training is to learn a probability distribution over textual tokens conditioned on some input text, various alternatives to maximum liklihood sampling of output tokens have been utilised that text of a higher quality as judged by human evaluators.
Although language models may exploit prior beliefs about the structure of natural language in order to achieve stronger performance (whatever that may mean to the researcher), many of the techniques can be applied to arbitrary sequences. Indeed there has been recent progress adapting autoregressive RNN- and Transformer-based models that were originally conceived for the task of language modeling not only to musical sequences but also to seemingly remote problems such as time series forecasting with some success (REFERENCE ENHANCING LOCALITY..., DEEPAR...).
Approaches to musical generation frequently start with a corpus of MIDI files which are then subjected to some encoding scheme which converts the MIDI data into a sequence of simplified textual tokens. These encoded representations are then used by a language model for training and evaluation. In order to listen to the resulting outputs, the tokenised representation of the musical sequence generated by the language model must be decoded into MIDI format.
It is the hope of the author that users of Neural Seq, at least in the future, can spend their time focusing on the interesting parts of this process, namely coming up with novel encoding/decoding schemes for MIDI data and developing models, while having the tools at their disposal to iterate quickly and effortlessly sample music generated by their work.
Note: The following assumes that you have git installed are running Python 3.7 on MacOS and have configured a virtual environment of some description. To set up a virtual env using venv see here.
First clone the repo into your local directory:
$ git clone https://github.com/aliroberts/neural-seq
Install Python dependencies:
$ cd neural-seq
$ pip install -r requirements.txt
Make the command line script executable for the current user:
$ chmod u+x nseq.sh
Let's explore Neural Seq and create a neural network-powered drum machine. To do so we perform the following steps:
- Download a collection of MIDI files from which our training data will be generated.
- Extract the relevant parts from the MIDI files and convert them to some textual representation (encoding)
- Train a model using the data
- Generate textual sequences from the model and convert them to MIDI (decoding) for playback
Run the following command to download a collection of MIDI files corresponding to the songs of various pop/rock artists from the last few decades:
$ ./nseq.sh fetch-data
The files will be downloaded to a directory .data
in the current one.
You can view a list of artists using the list-artists
command:
$ ./nseq.sh list-artists --search phil\ col
Phil Collins
And list songs for a specified artist using the list-songs
command:
$ ./nseq list-songs --artist Phil\ Collins
A Groovy Kind of Love.mid
Against All Odds.mid
Another Day in Paradise.1.mid
Another Day in Paradise.2.mid
Another Day in Paradise.mid
Don't Lose My Number.mid
Easy Lover.mid
I Wish It Would Rain Down.mid
In The Air Tonight.1.mid
In The Air Tonight.mid
No Son of Mine.mid
One More Night.mid
Sussudio.mid
True Colors.mid
You Can't Hurry Love.mid
If we pass the --path
option to the list-songs
command then we can view the relative path to MIDI files:
$ ./nseq list-songs --artist Phil\ Collins --path
.data/midi_data/clean_midi/Phil Collins/A Groovy Kind of Love.mid
.data/midi_data/clean_midi/Phil Collins/Against All Odds.mid
.data/midi_data/clean_midi/Phil Collins/Another Day in Paradise.1.mid
.data/midi_data/clean_midi/Phil Collins/Another Day in Paradise.2.mid
.data/midi_data/clean_midi/Phil Collins/Another Day in Paradise.mid
.data/midi_data/clean_midi/Phil Collins/Don't Lose My Number.mid
.data/midi_data/clean_midi/Phil Collins/Easy Lover.mid
.data/midi_data/clean_midi/Phil Collins/I Wish It Would Rain Down.mid
.data/midi_data/clean_midi/Phil Collins/In The Air Tonight.1.mid
.data/midi_data/clean_midi/Phil Collins/In The Air Tonight.mid
.data/midi_data/clean_midi/Phil Collins/No Son of Mine.mid
.data/midi_data/clean_midi/Phil Collins/One More Night.mid
.data/midi_data/clean_midi/Phil Collins/Sussudio.mid
.data/midi_data/clean_midi/Phil Collins/True Colors.mid
.data/midi_data/clean_midi/Phil Collins/You Can't Hurry Love.mid
We can listen to a specified MIDI file using the play-midi
command:
$ ./nseq.sh play-midi .data/midi_data/clean_midi/Phil\ Collins/In\ The\ Air\ Tonight.mid
We can also filter out the part matching a specified instrument name (corresponding to General MIDI patch numbers/names for all instruments except drums which use the name 'Drum Kit') and listen to that part in isolation.
$ ./nseq.sh play-midi .data/midi_data/clean_midi/Phil\ Collins/In\ The\ Air\ Tonight.mid --filter drum
Next, let's filter out the drum part from songs for a selection of artists and encode them using a specifed encoder. We can do this using the gen-dataset
command that also splits up the resulting files into training, validation and test sets in the specified proportions.
First let's create a text file with the names of the artists whose songs we want to encode:
$ echo $'David Bowie\nPhil Collins\nDaft Punk\nMichael Jackson\nNew Order\nTalking Heads' > artists.txt
Next, let's create a dataset from the songs by the selected artists:
$ ./nseq.sh gen-dataset --artists artists.txt --dest drum_dataset --valid=0.2 --encoder simple_drum --filter drum
This command will create a drum_dataset/
directory in the current directory and create a training, validation and test set for us (70%, 15%, 15% split).
Note that in the dataset we downloaded there are some duplicate songs that will be ignored by the above command.
The encoder argument can take the name of any file in the src/encoders/
directory. The supplied file will be searched and the first subclass of BaseEncoder that is found will be picked out.
The simple_drum
encoder will convert the MIDI file (supplied to it as a PrettyMIDI object) into a list of 'action' and 'duration' tokens that specify the pitch (the drum in this case) that is played and the duration until the next set of pitches is played. This is essentially a polyphonic encoding scheme but since the expected instrument is percussive and the sounds are short-lived, all the lengths are fixed to a 16th-note duration.
For example:
import pretty_midi
# Load a PrettyMidi object for the specified file. This is the sort of object that an Encoder instance will receive.
midi_data = pretty_midi.PrettyMIDI('.data/midi_data/clean_midi/Phil Collins/In The Air Tonight.mid')
# Pick out the first 10 notes associated with the drum kit
drum_kit = midi_data.instruments[0]
drum_kit.notes[:8]
[Note(start=2.500000, end=2.519531, pitch=42, velocity=52),
Note(start=3.125000, end=3.144531, pitch=42, velocity=62),
Note(start=3.750000, end=3.769531, pitch=42, velocity=68),
Note(start=4.375000, end=4.394531, pitch=42, velocity=66),
Note(start=5.000000, end=5.019531, pitch=46, velocity=40),
Note(start=5.312500, end=5.332031, pitch=64, velocity=30),
Note(start=5.312500, end=5.332031, pitch=50, velocity=50),
Note(start=5.312500, end=5.332031, pitch=42, velocity=54)]
The tempo of the song is 96 BPM (which can be extracted via the PrettyMidi
object) which means each 16th note lasts (60 / 92 / 4) = 0.15625
seconds. Snapping the sequence to the nearest 16th note time value and encoding the resulting notes according to this scheme results in the token sequence
["D-16", "P-42", "D-4", "P-42", "D-4", "P-42", "D-4", "P-42", "D-4", "P-46", "D-2", "P-50", "P-42", "P-64", "D-4", "P-64", "P-42", "P-50", "D-4", "P-41", "P-35", "P-42", "D-2", "P-64", "P-46", "P-50", "D-2"]
Now we have a dataset, we can train a model. We'll train an AWD LSTM (Merity et al. 2017) language model.
$ ./nseq.sh train --data-dir drum_dataset --model awd_lstm --dest drum_model --bptt 32 --epochs 10 --bs 64 --lr 0.3 --when 5,10
Models can be found in the src/models
directory
TODO