Natooz / MidiTok

MIDI / symbolic music tokenizers for Deep Learning models 🎶

Home Page:https://miditok.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows: FileExistsError running data_augmentation on data_set.

efraimdahl opened this issue · comments

Cheers! Running the tokenizer on a data-set causes a FileExistsError when running on Windows.

Steps to reproduce:


from miditok import MMM
from pathlib import Path

# Creates the tokenizer and list the file paths
tokenizer = MMM()  # using defaults parameters (constants.py)

midi_paths = list(Path("path","to","midi").glob("**/*.mid"))
data_augmentation_offsets = [2, 1, 1]  # data augmentation on 2 pitch octaves, 1 velocity and 1 duration values
tokenizer.tokenize_midi_dataset(midi_paths, Path("path","to","non_bpe_tokens"),
                                data_augment_offsets=data_augmentation_offsets)

The following error occurs, and prevents further data_augmentation:


Tokenizing MIDIs (to/non_bpe_tokens): 100% 5/5 [00:00<00:00, 22.08it/s]
Performing data augmentation:   0%|                                                                                                    | 0/5 [00:00<?, ?it/s] 
Traceback (most recent call last):
  File "D:\xyz\Preprocessing\tokenizer.py", line 9, in <module>
    tokenizer.tokenize_midi_dataset(midi_paths, Path("path","to","non_bpe_tokens"),
  File "D:\Programs\Anaconda\Lib\site-packages\miditok\midi_tokenizer.py", line 1883, in tokenize_midi_dataset
    data_augmentation_dataset(out_dir, self, *data_augment_offsets, copy_original_in_new_location=True)
  File "D:\Programs\Anaconda\Lib\site-packages\miditok\data_augmentation\data_augmentation.py", line 164, in data_augmentation_dataset
    saving_path.parent.mkdir(parents=True, exist_ok=True)
  File "D:\Programms\Anaconda\Lib\pathlib.py", line 1116, in mkdir
    os.mkdir(self, mode)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'path\\to\\non_bpe_tokens\\1.json'

The problem seems to lie in the call of the data_augmentation_dataset function in the tokenize_midi_dataset() function in the miditok/midi_tokenizer.py file. The copy_original_in_new_location variable is not set. On the version of the file here on github it is correctly set to False. In the version downloaded through pip, the variable is not set.

Hi, thank you for the report!
Fortunately, it has very recently being fixed in #109 !
You can get the fix by installing from git: pip install git+https://github.com/Natooz/MidiTok
Or it will be featured in the next released which should come this week

This issue is stale because it has been open for 30 days with no activity.

This issue was closed because it has been inactive for 14 days since being marked as stale.