seyong92 / phoneme-informed-note-level-singing-transcription

A pretrained model for "A Phoneme-informed Neural Network Model for Note-level Singing Transcription", ICASSP 2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

phoneme-informed-note-level-singing-transcription

A pretrained model for "A Phoneme-informed Neural Network Model for Note-level Singing Transcription", ICASSP 2023

Requirements

  • torch==1.13.1
  • torchaudio==0.13.1
  • nnAudio==0.3.2
  • mido==1.2.10
  • mir_eval==0.7
  • librosa==0.9.1
  • numpy==1.23.4
  • wquantiles==0.4

Or if you are using Poetry, you can install the dependencies by running

$ poetry install

Usage

$ python infer.py checkpoints/model.pt INPUT_FILE OUTPUT_FILE --bpm BPM_OF_INPUT_FILE --device DEVICE
  • INPUT_FILE is the path to the input audio file.
  • OUTPUT_FILE is the path to the output MIDI file. (If you do not give this argument, the default file name will be out.mid.)
  • BPM_OF_INPUT_FILE is the BPM of the input audio file. (If you do not give this argument, the default value will be 120.)
  • DEVICE is the device to run the model. (If you do not give this argument, the default device will be cuda:0 if available, otherwise cpu.)

Blog with Demo Examples

https://seyong92.github.io/phoneme-informed-transcription-blog/

Remarks

To pull the model checkpoint from the GitHub repository, Git LFS is needed.

For people who suffer for downloading the model checkpoint through Git LFS, I uploaded the model checkpoint in this link.

About

A pretrained model for "A Phoneme-informed Neural Network Model for Note-level Singing Transcription", ICASSP 2023

License:MIT License


Languages

Language:Python 100.0%