This project has been realized for the Musical Machine Learning course of ATIAM master, supervised by Philippe Esling, Théis Bazin and Constance Douwes.
- Python 3.8 (may work on older versions of Python 3)
-
Clone this repository
git clone https://github.com/aRI0U/spectrogram-inpainting.git
-
Install all requirements
cd spectrogram-inpainting/code pip install -r requirements.txt
-
That's all folks!
Experiments have been conducted on images (MNIST) and audio (NSynth).
There is no need to run any separate script to download/extract datasets. They will be downloaded the first time you run the main Python program.
In order to train a model on MNIST, just type the following:
python main.py -c configs/mnist.json
In order to train a model on NSynth, just type the following:
python main.py -c configs/nsynth.json
Other options can be modified through configuration files or command-line arguments. For en exhaustive description of these options, type python main.py --help
.
Models can be inspected using TensorBoard. In order to inspect models, type:
tensorboard --logdir logs
and open http://localhost:6006 in a web browser.
This repository has the following architecture:
code
contains all our implementationcode/configs
contains some JSON files with the command line arguments we used for our experiments.code/datamodules
contains everything related to data (automatic downloading and extracting, data transforms, batch loading...)code/datasets
is the place datasets are stored (not in the repo). For example, the validation set of Nsynth will be stored incode/datasets/NSynth/valid
, and so on.code/experiments
contains notebooks we used for experiments. In particular, the naive Bayes classifier we used for spectrogram inpainting is located in this folder.code/model
contains all neural network architectures : encoders, decoders, quantizers and main VQ-VAE pipelinecode/tests
contains some testscode/utils
contains several utilities (parser, CO2 tracker, etc.)
docs
contains the subject of this projectreport
contains the source code of our report
-
audio in tensorboard
-
finish debugging model
-
(linear) transformers pour génération
-
use phase
-
try other transforms than basic spectrogram (mel...)
-
use resnet for encoder/decoder
-
VQ-VAE2