Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation
Authors: Yen-Tung Yeh, Bo-Yu Chen, Yi-Hsuan Yang
-
This is the official repository contains code for ISMIR2022 paper Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation. (Pytorch)
-
We provide pre-trained models to generate drum loops and synth loops.
$ conda env create -f environment.yml
- drum fusion
gdown --id 15h07E__BxvaHqBiLwQLe2BzbRU8uPnHF
- drum vgg
gdown --id 1cXGXA2b8nLKEClGSPW6ttiwwWErL1Fx4
- synth fusion
gdown --id 1phrhU-QxQc33adIigzl_6GycmpHvwmo1
- synth vgg
gdown --id 1I4RFYFTRzL6V0V8ndFqNHnJIDlyFzf-p
Generate audio with our pre-trained model.
-
Download the pre-trained checkpoint above.
-
Put your checkpoint into the checkpoint/
loops_type
/config
loops_type
can bedrum
orsynth
,config
can bevgg
orfusion
.For example, If you download the drum fusion checkpoint, then put it into the checkpoint/drum/fusion folder.
-
Go to Pytorch VGGish and replace the feature_networks/torchvggish. That is, feature_networks/torchvggish should be the same repo as Pytorch VGGish.
- Generate one bar drum loop (
vgg
)
bash quick_start/generate_drum_vgg.sh
- Generate one bar drum loop (
fusion
)
bash quick_start/generate_drum_fusion.sh
- Generate one bar synth loop (
vgg
)
bash quick_start/generate_synth_vgg.sh
- Generate one bar synth loop (
fusion
)
bash quick_start/generate_synth_fusion.sh
We collected drum loops and synth loops from Looperman. Unfortunately, we can not provide our dataset due to license issues.
In the following, we describe how to train the model from scratch.
We mainly follow the same pre-preprocessing method with LoopTest repository.
To preprocess data, modify some settings such as the data path in the code and run codes in the preprocess directory with the following orders.
python trim_2_seconds.py
python extract_mel.py
python make_dataset.py
python compute_mean_std.py
Check train.sh and modify some arguments for training.
python3 train.py \
--size 64 --batch 64 --sample_dir [sample dir] \
--checkpoint_dir [model ckpt dir] --loops_type [kinds of loops] \
--fusion [fushion or not] \
--pre_trained_model_type [types of pre-trained feature network] \
--pre_trained_model_path [pre-trained feature network ckpt path]\
[mel-spectrogram from the previous pre-processing step]
sample_dir
is the directory to store the generated Mel-spectrograms during training.checkpoint_dir
is the directory to store model checkpoints.loops_type
means which kinds of loops you want to generate. ["synth"
,"drums"
]fusion
indicates whether using fusion or not. ["on"
,"off"
]pre_trained_model_type
stands for different pre-trained feature networks. ["vgg"
,"autotagging"
,"loops_genre"
]pre_trained_model_path
is the data path of pre-trained checkpoints.
Note: If fusion
is "on"
, then the pre_trained_model_type
can only be "loops_genre"
or "autotagging"
. This is because with the fusion
method, "vgg"
is needed for default.
Last, run the following command.
bash scripts/train.sh
In the following section, we describe how to generate audio.
Check generate_audio.sh and modify some arguments
python3 inference.py \
--ckpt [generator_checkpoint] \
--pics [generate how many audio] \
--data_path [mean and std for generating] \
--store_path [where to store audio] \
--vocoder_folder "melgan/drum_vocoder"
ckpt
is the path of your checkpoint for generating audio.pics
means how many mel-spectrograms you want to generate.data_path
stores the mean and std for our dataset. ["data/drum"
,"data/synth"
]store_path
is the directory to store your own audio.vocoder_folder
contains the MelGAN information for generating, we provide our drum vocoder and synth vocoder. ["melgan/drum_vocoder"
,"melgan/synth_vocoder"
]
Last, run the following command.
bash scripts/generate_audio.sh
We use MelGAN as the vocoder. We trained the vocoder with the looperman dataset and we provide two checkpoints in the melgan directory, one is for drum loop
and the other is for synth loop
.
The code comes heavily from the below
If you find this repo useful, please kindly cite the following information.
@inproceedings{ arthur2022pjloopgan,
title={Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation}},
author={Yen-Tung Yeh, Bo-Yu Chen, and Yi-Hsuan Yang},
booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
year={202},
}