Natooz / MidiTok

MIDI / symbolic music tokenizers for Deep Learning models 🎶

Home Page:https://miditok.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When using REMI for tokenization with use_time_signatures=True, many duplicate measures can be encoded.

Chunyuan-Li opened this issue · comments

miditok version

2.1.5

Problem summary

When using REMI for tokenization with use_time_signatures=True, many duplicate measures can be encoded.

Steps to reproduce

from miditok import REMI, REMIPlus, TokenizerConfig, MIDITokenizer
config = TokenizerConfig(use_tempos=True, nb_tempos=240, tempo_range=(20, 259), use_rests=True, use_time_signatures=True, time_signature_range={8: (3,12), 4: (1,6)}, use_programs=False)
tokenizer = REMI(config)
tokens = tokenizer(midi_path)
print("MIDI tokens: \n", tokens)

Expected vs. actual behavior

MIDI tokens:
[TokSequence(tokens=[[..., 'Velocity_79', 'Duration_1.0.8', 'Bar_None', 'TimeSig_4/4', 'Bar_None', 'TimeSig_4/4', 'Bar_None', 'TimeSig_4/4', 'Bar_None', 'TimeSig_4/4', 'Bar_None', 'TimeSig_4/4', 'Bar_None', 'TimeSig_4/4', 'Bar_None', 'TimeSig_4/4', 'Position_0', 'Position_3', 'Pitch_37', 'Velocity_79', 'Duration_1.0.8', 'Position_7', 'Pitch_66', 'Velocity_79', 'Duration_2.0.8', 'Position_11', ...]

debug

/usr/local/lib/python3.10/dist-packages/miditok/tokenizations/remi.py: line.101

            if event.time != previous_tick:
                # (Rest)
                if (
                    self.config.use_rests
                    and event.time - previous_note_end >= self._min_rest
                ):
                    previous_tick = previous_note_end
                    rest_values = self._ticks_to_duration_tokens(
                        event.time - previous_tick, rest=True
                    )
                    for dur_value, dur_ticks in zip(*rest_values):
                        all_events.append(
                            Event(
                                type="Rest",
                                value=".".join(map(str, dur_value)),
                                time=previous_tick,
                                desc=f"{event.time - previous_tick} ticks",
                            )
                        )
                        previous_tick += dur_ticks
                    current_bar = previous_tick // ticks_per_bar

if previous_note_end > event.time, the current_bar will not be updated

Also, I found that the number of measures in the tokens obtained by MIDI tokenizer is less than expected. For example, I have a MusicXML file with 210 measures, which I converted to MIDI using MuseScore, and it should have retained information such as tempo and time signature. However, after using REMI tokenizer, I only got 201 measures in the tokens.

Hi,
Thanks again for the bug report!
Could you share the MIDI file you used that raised the bug ?
I'll be able to inspect what's going wrong here :)

Edit: This might be linked to #73 (fixed in #75)

This problem usually occurs in MIDI files with a large number of measures.
And when I tried to fix this bug (update current_bar), the REMI tokenizer got 107 measures, but there are 124 measures in the original MXL file.
6354774_Macabre Waltz.zip

Thank you for the file! This was indeed a bug requiring to modify the time encoding / decoding quite a bit, than again for spotting it!
I managed to crack what was going wrong here, this is fixed in #76 for REMI and CPWord. I'm working on fixing it for Octuple and MMM as well.

I'll include this MIDI files in the MIDI test cases (unless you do not want).

Fix merged !
I'll run a few more tests to make sure everything is good before releasing the update

Edit: I tested with additional files containing multiple time signature changes, everything seems alright. Do you have other MIDI files you would like to be tested ? In case you would like to it yourself, you can use the test_one_track.py script in the tests.

The MIDI tokens look normal now, but I'm a bit confused because the original MusicXML file has 124 measures, but the corresponding MIDI file after REMI tokenizer only has 107 measures (in [6354774_Macabre Waltz]). Why is this happening?

Are you using the latest commit on the main branch ?
The tokenization --> detokenization works fine for me, with 121 measures.
Here is for comparison (the first track is the orignal one untouched): 6354774_Macabre Waltz.mid.zip

I'm in the latest commit on the main branch, but

tokens = tokenizer(midi_path)
len(' '.join(tokens[0].tokens).split('Bar_None'))

results is 93

Yes this is because of the use of rests. As some rests may last longer than a bar, some bars does not come with Bar tokens (they are kind of skipped).
If you disable rests you should end up with the right number with your method. Otherwise another way to count bars would be to parse the time signature changes and count the number of bars in between.

Yes, you're right. I got the result of 122 with use_rests=False. Strangely, there are 124 measures in the corresponding MusicXML, but the MIDI was obtained by converting an MXL file using MuseScore. Why isn't it matching here?

I don't know for sure, but maybe some rests were "cut" at the beginning or the end of the original musicXML file ? 🤷‍♂️

Maybe, I'll look for the reason again. At least now, the REMI tokenizer is working properly.

Hi, I think I have found the reason. First, I made a mistake that when converting musicxml to midi through musescore, it will have two tracks, but I only took the first one, which resulted in fewer measures (such as 6354774_Macabre Waltz). Secondly, musescore will merge adjacent notes, so there will be a long note at the end, which will not be split by REMI tokenizer, resulting in the duration of some tokens being greater than TimeSig, which also leads to the inconsistency of the number of measures.

debug

config = TokenizerConfig(use_tempos=True, nb_tempos=340, tempo_range=(20, 359), use_time_signatures=True, time_signature_range={8: (3,12), 4: (1,8)}, beat_res={(0,8): 8, (8,12): 4}, use_programs=True, one_token_stream=True)
tokenizer = REMI(config)
midi = MidiFile('6338816_Étude No. 4.mid')
tokens = tokenizer(midi)
converted_midi = tokenizer(tokens)
converted_midi.dump('test_converted.mid')

notes in last bar (one track)

 Note(start=48240, end=48720, pitch=73, velocity=79),
 Note(start=48240, end=48720, pitch=77, velocity=79),
 Note(start=48240, end=48720, pitch=81, velocity=79),
 Note(start=49680, end=50160, pitch=73, velocity=79),
 Note(start=49680, end=50160, pitch=77, velocity=79),
 Note(start=49680, end=50160, pitch=81, velocity=79),
 Note(start=51120, end=55200, pitch=73, velocity=79),
 Note(start=51120, end=55200, pitch=77, velocity=79),
 Note(start=51120, end=55200, pitch=81, velocity=79)]

REMI tokens in last bar

TimeSig_8/4 Position_0 Tempo_53.0 Program_0 Pitch_58 Velocity_79 Duration_0.1.8 Position_1 Program_0 Pitch_61 Velocity_79 Duration_0.1.8 Position_2 Program_0 Pitch_66 Velocity_79 Duration_0.1.8 Position_3 Program_0 Pitch_67 Velocity_79 Duration_0.1.8 Position_4 Program_0 Pitch_69 Velocity_79 Duration_2.0.8 Position_12 Program_0 Pitch_73 Velocity_79 Duration_1.0.8 Program_0 Pitch_77 Velocity_79 Duration_1.0.8 Program_0 Pitch_81 Velocity_79 Duration_1.0.8 Position_24 Tempo_92.0 Program_0 Pitch_58 Velocity_79 Duration_0.1.8 Position_25 Program_0 Pitch_61 Velocity_79 Duration_0.1.8 Position_26 Program_0 Pitch_66 Velocity_79 Duration_0.1.8 Position_27 Program_0 Pitch_67 Velocity_79 Duration_0.1.8 Position_28 Program_0 Pitch_69 Velocity_79 Duration_2.0.8 Position_36 Program_0 Pitch_73 Velocity_79 Duration_1.0.8 Program_0 Pitch_77 Velocity_79 Duration_1.0.8 Program_0 Pitch_81 Velocity_79 Duration_1.0.8 Position_48 Tempo_53.0 Program_0 Pitch_58 Velocity_79 Duration_0.1.8 Position_49 Program_0 Pitch_61 Velocity_79 Duration_0.1.8 Position_50 Program_0 Pitch_66 Velocity_79 Duration_0.1.8 Position_51 Program_0 Pitch_67 Velocity_79 Duration_0.1.8 Position_52 Program_0 Pitch_69 Velocity_79 Duration_9.2.4 Position_60 Program_0 Pitch_73 Velocity_79 Duration_8.2.4 Program_0 Pitch_77 Velocity_79 Duration_8.2.4 Program_0 Pitch_81 Velocity_79 Duration_8.2.4

Compared with midi, in the original mxl, the chord at the end is divided into two parts in two measures. Moreover, REMI did not split it when tokenizer

other

When I tried to use REMI detokenizer and save it as midi, I opened the converted midi with musescore and found that it generated several empty measures (measure 7 and 14).

6338816_Étude No. 4.zip

I'm looking into it.
First thing I can say is that the MIDI -> tokens -> MIDI conversion works as it should in one_token_stream mode (meaning that the two tracks are merged into one before converting into tokens).

I'm looking into it. First thing I can say is that the MIDI -> tokens -> MIDI conversion works as it should in one_token_stream mode (meaning that the two tracks are merged into one before converting into tokens).

Sorry, I missed this requirement. Now it outputs normally when detokenizer.

Well thank you, thanks to this MIDI I realized that the decoding in multiple stream with several identical programs would merge tracks. I'm working on a fix

Your file covered a case non-covered yet when using both time signatures and rests that I managed to fix!
After #81 is merged, your previous code should work with and without one_token_stream 🙌

Edit: if you ever encounter other problematic cases please don't hesitate to share them, that's thanks to this that bugs can be found and fixed :)

Great, thank you very much. Besides, when I tried to split notes in the tokenizer so that the duration of each measure would not exceed the time signature, I did get the correct number of measures. However, this also introduced a problem, namely that the notes at the measure boundaries were split into two, requiring additional markers or merging during detokenization to ensure consistency with the original. Moreover, I realized that only when it comes to musicxml would a correct number of measures be necessary, and others may not need this feature.

I'm note sure to follow at 100% what's being done exactly 🤔

What I mean is that if a note's duration spans across two measures, then in musicxml it is split into two notes and connected with a beam to indicate that they are played together. However, in the MIDI tokenizer, this note will only be placed in one measure and will have a longer duration, such as 9.0.4 when the time signature is 8/4, which will make the total duration of the measure exceed the time signature and also reduce the number of measures in the MIDI tokenizer. If I want to solve this problem, I need to split the notes that exceed the measure duration and put the excess part in the second measure. However, this also introduces other problems, namely that during detokenization, this type of note needs to be merged.
(Firstly, to explain why there are fewer measures, and secondly, hoping that this feature will be available in the future.)

If a note's duration spans across two measures, then in musicxml it is split into two notes and connected with a beam to indicate that they are played together.

Ok, thank you for clarifying, I'm not very familiar with musicXML.
But I am still not sure about what is the problem here, and what are you trying to solve. Are you trying to do musicXML --> MIDI --> tokens --> MIDI --> musicXML conversion pipeline, and hopping to retrieve the exact same musicXML at the end? In that case I get now that the need to split notes spanning across multiple measures.

More broadly about the support of musicXML in MidiTok, this could be done one day, if there is enough demand or if someone is motivated to add it. It shouldn't be that hard, basically juste adding a layer of musicXML <--> MIDI conversion.

Yes, what I tried was the conversion process audio -> MIDI file -> MIDI tokens -> MXL tokens -> MusicXML, sometimes the number of notes in the MIDI file is too large and usually needs to be split into multiple parts (multiple measures in each part), so it is necessary to align the number and position of notes between the MIDI (tokens) and MXL (tokens).
In addition, for MIDI -> MusicXML conversion, people usually use Musescore, but Musescore changes the original tempo and time signature and has many errors. Currently, I haven't found a good solution to this problem (training a model by self is relatively better), and there is usually no problem with MusicXML -> MIDI conversion.

What do you mean by "MIDI tokens (MidiTok) --> MXL tokens"?
You could perform split the notes of the MIDI before the MIDI --> MusicXML conversion.
This could be done by detecting the bars in function of the time signature changes, then analyse each note and split if they span over two measures. You can maybe inspire you from the REMI.add_time_events method which determines the number of ticks per bar depending on the TS.

Ultimately this is something that could be worth to integrate in Musescore if I follow correctly

Yes, your understanding is absolutely correct. I have already implemented this part, but the overall effectiveness is limited due to the scarcity of data.

This issue is stale because it has been open for 30 days with no activity.