Try the model here. Check out a video showcase here.
osuT5 is a transformer-based encoder-decoder that uses spectrogram inputs to generate osu! hit-object output events. The goal of this project is to automatically generate osu! beatmaps from any song.
This project is heavily inspired by Google Magenta's MT3, quoting their paper:
This sequence-to-sequence approach simplifies transcription by jointly modeling audio features and language-like output dependencies, thus removing the need for task-specific architectures.
The high-level overview of the model's input-output is as follow:
The model uses Mel spectrogram frames as encoder input, with one frame per input position. The model decoder output at each step is a softmax distribution over a discrete, predefined, vocabulary of events. Outputs are sparse, events are only needed when a hit-object occurs, instead of annotating every single audio frame.
The instruction below allows you to generate beatmaps on your local machine.
Clone the repo and create a Python virtual environment. Activate the virtual environment.
git clone https://github.com/gyataro/osuTransformer.git
cd osuTransformer
python -m venv .venv
Install ffmpeg, PyTorch, and the remaining Python dependencies.
pip install -r requirements.txt
Download the latest model from the release section.
Run inference.py
and pass in some arguments to generate beatmaps.
python -m inference \
model_path [PATH TO DOWNLOADED MODEL] \
audio_path [PATH TO INPUT AUDIO] \
output_path [PATH TO OUTPUT DIRECTORY] \
bpm [BEATS PER MINUTE OF INPUT AUDIO] \
offset [START OF BEAT, IN MILISECONDS, FROM THE BEGINNING OF INPUT AUDIO] \
title [SONG TITLE] \
artist [SONG ARTIST]
Example:
python -m inference
model_path="./osuT5_model.bin" \
audio_path="./song.mp3" \
output_path="./output" \
bpm=120 \
offset=0 \
title="A Great Song" \
artist="A Great Artist"
The instruction below creates a training environment on your local machine.
Clone the repo and create a Python virtual environment. Activate the virtual environment.
git clone https://github.com/gyataro/osuTransformer.git
cd osuTransformer
python -m venv .venv
Install ffmpeg, PyTorch, and the remaining Python dependencies.
pip install -r requirements.txt
The dataset is available on Kaggle. You can also prepare your own dataset.
kaggle datasets download -d gernyataro/osu-beatmap-dataset
All configurations are located in ./configs/train.yaml
. Begin training by calling train.py
.
python train.py
Special thanks to:
- The authors of nanoT5 for their T5 training code.
- Hugging Face team for their tools.
- The osu! community for the beatmaps.
- osu! Beatmap Generator by Syps (Nick Sypteras)
- osumapper by kotritrona, jyvden, Yoyolick (Ryan Zmuda)
- osu! Diffusion by OliBomby (Olivier Schipper), NiceAesth (Andrei Baciu)