CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis

This repo contains a PyTorch implementation for the paper CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis by Simon Rouard and Gaëtan Hadjeres accepted at ISMIR 2021. You can hear some material on this link.

We propose to use the continuous framework of diffusion models to the task of unconditional audio generation on drum sounds.

Moreover, the flexibility of diffusion models lets us perform sound design on drums such as : regeneration of variations of a sound, class-conditional/class mixing generation, interpolations between sounds or inpainting. By using the latent representation given by the forward Ordinary Differential Equation, you can also load any 44.1kHz drum sound and manipulate it. It has to be of length 21.000 if you use the pretrained checkpoints provided.

Requirements

Run the following line in your terminal in order to install all the requirements

pip install -r requirements.txt

What is in the repo?

All the python files excepts inference.py, model_classifier.py and inference_notebook.ipynb are dedicated to the training of the model on a mono sound dataset. To train a model, you need to adapt the params.py file to your configuration. Then, you just have to run:

python3 __main__.py

You can monitor the model during training by running: tensorboard --logdir weights (if you chose 'weights' as 'model_dir' in the params.py file)

The file model_classifier.py contains the architecture of the noise conditioned classifier necessary to the class-conditional generations. It has only been trained on VP SDEs.
The file inference.py contains all the types of sampling. See the Jupyter Notebook inference_notebook.ipynb to understand all the possibilities that the model offers.

Checkpoints of the model

You can download the folder with the saved weights on this link. Then, put the folder saved_weights in your repository so that the notebook works well.

How to extend the code?

New SDEs: you can train the model with a new SDE by creating a new class in the sde.py file. It must contain the functions sigma(t), mean(t), beta(t) and g(t) which are linked in the Appendix D of the paper, formula (33).

simonrouard / CRASH