openvpi / SOME

SOME: Singing-Oriented MIDI Extractor.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can this get the midi duration and sequence ?

francqz31 opened this issue · comments

@yqzhishen Hello it is me again I just noticed this project , like we discussed before, can this be used to get the
" MIDI sequence | MIDI duration sequence" especially for English ?

like the one in opencpop as you said before "filename | lyrics | phoneme sequence | MIDI sequence | MIDI duration sequence | phoneme duration sequence | is slur sequence"

I'm not really looking for something to get me the "phoneme sequence | or the phoneme duration sequence or is slur" right now,
just the Midi sequence and Midi duration sequence accurately that's what I need! so can SOME do that ?

Thanks in advance!

openvpi/DiffSinger#29

Yes SOME is language-independent, but you still need *.ds + *.wav to train

@yqzhishen so SOME doesn't bring the Midi duration sequence ? just Midi sequence ?

At inference time SOME produces 3 outputs:

  • note_seq, the MIDI pitch sequence
  • note_rest, to indicate whether the note is a rest note
  • note_dur, the MIDI duration sequence in seconds

Ok Fair enough Thanks again , I will reopen the issue if the training scripts get released and I had issue with it. I will be waiting since I want to train in English.

Hello again Mr. yqzhishen
I did run python infer.py --model CKPT_PATH --wav WAV_PATH
and I got this out:
"accumulate_grad_batches: 1, audio_sample_rate: 44100, binarization_args: {'num_workers': 0, 'shuffle': True}, binarizer_cls: preprocessing.MIDIExtractionBinarizer, binary_data_dir: data/some_ds_fixmel_spk3_aug8/binary,
clip_grad_norm: 1, dataloader_prefetch_factor: 2, ddp_backend: nccl, ds_workers: 4, finetune_ckpt_path: None,
finetune_enabled: False, finetune_ignored_params: [], finetune_strict_shapes: True, fmax: 8000, fmin: 40,
freezing_enabled: False, frozen_params: [], hop_size: 512, log_interval: 100, lr_scheduler_args: {'min_lr': 1e-05, 'scheduler_cls': 'lr_scheduler.scheduler.WarmupLR', 'warmup_steps': 5000},
max_batch_frames: 80000, max_batch_size: 8, max_updates: 10000000, max_val_batch_frames: 10000, max_val_batch_size: 1,
midi_extractor_args: {'attention_drop': 0.1, 'attention_heads': 8, 'attention_heads_dim': 64, 'conv_drop': 0.1, 'dim': 512, 'ffn_latent_drop': 0.1, 'ffn_out_drop': 0.1, 'kernel_size': 31, 'lay': 8, 'use_lay_skip': True}, midi_max: 128, midi_min: 0, midi_num_bins: 256, midi_prob_deviation: 0.5,
midi_shift_proportion: 0.0, midi_shift_range: [-6, 6], model_cls: modules.model.Gmidi_conform.midi_conforms, num_ckpt_keep: 5, num_sanity_val_steps: 1,
num_valid_plots: 300, optimizer_args: {'beta1': 0.9, 'beta2': 0.98, 'lr': 0.0001, 'optimizer_cls': 'torch.optim.AdamW', 'weight_decay': 0}, pe: rmvpe, pe_ckpt: pretrained/rmvpe/model.pt, permanent_ckpt_interval: 40000,
permanent_ckpt_start: 200000, pl_trainer_accelerator: auto, pl_trainer_devices: auto, pl_trainer_num_nodes: 1, pl_trainer_precision: 32-true,
pl_trainer_strategy: auto, raw_data_dir: [], rest_threshold: 0.1, sampler_frame_count_grid: 6, seed: 114514,
sort_by_len: True, task_cls: training.MIDIExtractionTask, test_prefixes: None, train_set_name: train, units_dim: 80,
units_encoder: mel, units_encoder_ckpt: pretrained/contentvec/checkpoint_best_legacy_500.pt, use_buond_loss: True, use_midi_loss: True, val_check_interval: 4000,
valid_set_name: valid, win_size: 2048
| load 'model' from '/content/SOME/model_steps_64000_simplified.ckpt'.
100% 1/1 [00:01<00:00, 1.84s/it]
MIDI file saved at: '/content/SOME/202.mid'

** it converted the singing wav file into midi , Now how can I get the MIDI sequence , MIDI duration sequence of this midi file? what should I do ?

Thanks in advance!

You can use any editors or packages that support importing/extracting MIDI file format. But if you are able to read the code, you can get the raw outputs before the MIDI file is saved in infer.py

well the thing that unfortunately i have no idea how to do these 2 things , how can i get the raw outputs before the MIDI file is saved?

What are you using the MIDI file for?

Here midis is the raw outputs.

SOME/infer.py

Line 37 in e0ca1ed

midis = infer_ins.infer([c['waveform'] for c in chunks])

I'm using the midi file to have a dataset like opencpop but in English, I already have a way to get the phoneme sequence and duration , and now I'm looking to get the Midi sequence and duration from SOME

some_batch_infer.zip

Maybe this script can help, but it is not well-documented. You need to put it in your SOME directory, edit the parameters and options in the file, and run

Oh thanks so much , I edited the parameters and all: input_csv, out_csv, wav_folder , model_path and I got this
csv_datas:1
success: 0
my result.csv looks just like my transcriptions.csv , (I put the English transcription of my wav file in transcriptions.csv)