artificial-intelligence artificial-neural-networks dataset datasets machine-learning midi music music-information-retrieval music-theory

JS Fake Chorales

A MIDI dataset of 500 4-part chorales generated by the KS_Chorus algorithm, annotated with results from hundreds of listening test participants, with 700 further unannotated chorales.

The JS Fake Chorales will be periodically updated with new unannotated chorales generated by the community with the JS Fake Chorale Generator.

Metadata Structure

The annotations are stored as a nested dictionary in metadata/js-fakes-dataset.pkl, with the following structure:

{
    'midi/0.mid': {  # path to the midi file ('0.mid'-'499.mid')
        'skill_0': {  # self-reported domain expertise level of the respondent ('skill_0'-'skill_5', and 'TOTAL')
            'responses': Int,  # total responses
            'correct': Int,  # responses which correctly identified the sample as composed by A.I.
            'ave_plays': Float, # average number of times the sample was played before a response was submitted
            'ave_time': Float  # average time in seconds between hearing the sample and submitting a response
        }
    }
}

16th Note Split Version

We additionally provide the dataset pre-sliced into 16th-note time-steps in js-fakes-16thSeparated.npz. This is a dictionary with keys "pitches" and "chords". The value of each key is a numpy array of 500 sequences. For "pitches", each sequence is a piece from the JS Fakes, itself a list of timesteps. Each time-step has exactly four numbers; one pitch for each of the SATB voices. If a voice is silent at a given time step, its pitch is -1. For "chords", the format is the same, but each time step instead has just one value representing a chord, encoded as per (Peracha, 2020). The sequence at each index corresponds to the piece at the same index in the "pitches" list.

To load the dataset in this format in Python 3:

import numpy as np

jsf = np.load('js-fakes-16thSeparated.npz', allow_pickle=True, encoding='latin1')

JSF-Extended

The JS Fake Chorales will be periodically updated with new unannotated chorales generated by the community with the JS Fake Chorale Generator. These are presently provided in MIDI format only, in the jsf-extended/ directory. Currently, there are 700 JSF-Extended chorales available.

About

Dataset of 500 4-part chorales generated by the KS_Chorus algorithm, annotated with results from hundreds of listening test participants, with 700 further unannotated chorales.

https://arxiv.org/abs/2107.10388

artificial-intelligence artificial-neural-networks dataset datasets machine-learning midi music music-information-retrieval music-theory

Creative Commons Attribution 4.0 International