REMI

Authors: Yu-Siang Huang, Yi-Hsuan Yang and Wen-Yi Hsiao

Paper (arXiv) | Blog | Audio demo (Google Drive)

REMI, which stands for REvamped MIDI-derived events, is a new event representation we propose for converting MIDI scores into text-like discrete tokens. Compared to the MIDI-like event representation adopted in exising Transformer-based music composition models, REMI provides sequence models a metrical context for modeling the rhythmic patterns of music. Using REMI as the event representation, we train a Transformer-XL model to generate minute-long Pop piano music with expressive, coherent and clear structure of rhythm and harmony, without needing any post-processing to refine the result. The model also provides controllability of local tempo changes and chord progression.

Citation

@article{huang2020pop,
  title={Pop music transformer: Generating music with rhythm and harmony},
  author={Huang, Yu-Siang and Yang, Yi-Hsuan},
  journal={arXiv preprint arXiv:2002.00212},
  year={2020}
}

Getting Started

Install Dependencies

python 3.6 (recommend using Anaconda)
tensorflow-gpu 1.14.0 (pip install tensorflow-gpu==1.14.0)
miditoolkit (pip install miditoolkit)

Download Pre-trained Checkpoints

We provide two pre-trained checkpoints for generating samples.

REMI-tempo-checkpoint (428 MB)
REMI-tempo-chord-checkpoint (429 MB)

Obtain the MIDI Data

We provide the MIDI files including local tempo changes and estimated chord. (5 MB)

data/train: 775 files used for training models
data/evaluation: 100 files (prompts) used for the continuation experiments

Generate Samples

See main.py as an example:

from model import PopMusicTransformer
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

def main():
    # declare model
    model = PopMusicTransformer(checkpoint='REMI-tempo-checkpoint')
    # generate from scratch
    model.generate(
        n_target_bar=16,
        temperature=1.2,
        topk=5,
        output_path='./result/from_scratch.midi',
        prompt=None)
    # generate continuation
    model.generate(
        n_target_bar=16,
        temperature=1.2,
        topk=5,
        output_path='./result/continuation.midi',
        prompt='./data/evaluation/000.midi')
    # close model
    model.close()

if __name__ == '__main__':
    main()

Convert MIDI to REMI

You can find out how to convert the MIDI messages into REMI events in the midi2remi.ipynb.

FAQ

1. How to synthesize the audio files (e.g., mp3)?

We strongly recommend using DAW (e.g., Logic Pro) to open/play the generated MIDI files. Or, you can use FluidSynth with a SoundFont. However, it may not be able to correctly handle the tempo changes (see fluidsynth/issues/141).

2. What is the function of the inputs "temperature" and "topk"?

It is the temperature-controlled stochastic sampling methods are used for generating text from a trained language model. You can find out more details in the reference paper CTRL: 4.1 Sampling.

It is worth noting that the sampling method used for generation is very critical to the quality of the output, which is a research topic worthy of further exploration.

Acknowledgement

The content of modules.py comes from the kimiyoung/transformer-xl repository.

LiangHsia / remi