osuT5

Try the model here. Check out a video showcase here.

osuT5 is a transformer-based encoder-decoder that uses spectrogram inputs to generate osu! hit-object output events. The goal of this project is to automatically generate osu! beatmaps from any song.

This project is heavily inspired by Google Magenta's MT3, quoting their paper:

This sequence-to-sequence approach simplifies transcription by jointly modeling audio features and language-like output dependencies, thus removing the need for task-specific architectures.

Overview

The high-level overview of the model's input-output is as follow:

The model uses Mel spectrogram frames as encoder input, with one frame per input position. The model decoder output at each step is a softmax distribution over a discrete, predefined, vocabulary of events. Outputs are sparse, events are only needed when a hit-object occurs, instead of annotating every single audio frame.

Inference

The instruction below allows you to generate beatmaps on your local machine.

1. Clone the Repository

Clone the repo and create a Python virtual environment. Activate the virtual environment.

git clone https://github.com/gyataro/osuTransformer.git
cd osuTransformer
python -m venv .venv

2. Install Dependencies

Install ffmpeg, PyTorch, and the remaining Python dependencies.

pip install -r requirements.txt

3. Download Model

Download the latest model from the release section.

4. Begin Inference

Run inference.py and pass in some arguments to generate beatmaps.

python -m inference \
  model_path   [PATH TO DOWNLOADED MODEL] \
  audio_path   [PATH TO INPUT AUDIO] \
  output_path  [PATH TO OUTPUT DIRECTORY] \
  bpm          [BEATS PER MINUTE OF INPUT AUDIO] \
  offset       [START OF BEAT, IN MILISECONDS, FROM THE BEGINNING OF INPUT AUDIO] \
  title        [SONG TITLE] \
  artist       [SONG ARTIST]

Example:

python -m inference 
  model_path="./osuT5_model.bin" \ 
  audio_path="./song.mp3" \
  output_path="./output" \
  bpm=120 \
  offset=0 \
  title="A Great Song" \
  artist="A Great Artist"

Training

The instruction below creates a training environment on your local machine.

1. Clone the Repository

Clone the repo and create a Python virtual environment. Activate the virtual environment.

git clone https://github.com/gyataro/osuTransformer.git
cd osuTransformer
python -m venv .venv

2. Install Dependencies

Install ffmpeg, PyTorch, and the remaining Python dependencies.

pip install -r requirements.txt

3. Download Dataset

The dataset is available on Kaggle. You can also prepare your own dataset.

kaggle datasets download -d gernyataro/osu-beatmap-dataset

4. Configure Parameters and Begin Training

All configurations are located in ./configs/train.yaml. Begin training by calling train.py.

python train.py

Credits

Special thanks to:

The authors of nanoT5 for their T5 training code.
Hugging Face team for their tools.
The osu! community for the beatmaps.

Related Works

osu! Beatmap Generator by Syps (Nick Sypteras)
osumapper by kotritrona, jyvden, Yoyolick (Ryan Zmuda)
osu! Diffusion by OliBomby (Olivier Schipper), NiceAesth (Andrei Baciu)

OliBomby / osuT5

osuT5

Overview

Inference

1. Clone the Repository

2. Install Dependencies

3. Download Model

4. Begin Inference

Training

1. Clone the Repository

2. Install Dependencies

3. Download Dataset

4. Configure Parameters and Begin Training

Credits

Related Works

About

Languages