Audio Sample Generator

This app can generate new audio samples using the Stable Diffusion model (LoRA) for training.

Steps:

Convert existing audio samples to spectrograms.
Train a small Stable Diffusion model (LoRA) on spectrograms.
Generate new audio samples with a specified prompt.

Prerequisites

Python

How to install

git clone --recurse-submodules git@github.com:Danand/audio-sample-generator.git
cd audio-sample-generator
chmod +x run.sh

How to launch

./run.sh

How to use

Simply follow all pages from the sidebar sequentially.

Advanced settings are skipped here for convenience.

Extract Spectrograms

Open audio files.
Click the Extract button.
Review the spectrograms extracted from the audio files.
Proceed to the next page.

Prepare Dataset

Specify for each spectrogram:
- Subject
- Caption (comma-separated keywords)
- Optional: Weight
Click the Save button.

Train LoRA

Click the Train button.

Generate Audio with Stable Diffusion

Type in the Prompt.
Specify the Amount of audio to generate.
Click Generate.
Listen and save the generated samples if desired.

Extras

Batch Convert to Audio

That page is convenient for batch converting spectrograms to audio samples. You can experiment with any images of the respective size, not necessarily spectrograms.

About

Generating unique one-shot audio samples with Stable Diffusion.

riffusion stable-diffusion torchaudio

Languages

Language:Python 99.3%Language:Shell 0.7%