janzuiderveld / continuous-audio-representations

Official PyTorch implementation for "Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations".

Home Page:https://arxiv.org/abs/2111.08462

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Official PyTorch implementation for Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations. Accepted to "Deep Generative Models and Downstream Applications" (Oral) and "Machine Learning for Creativity and Design" (Poster) workshops at NeurIPS 2021.

See INR-collection for more details on the used implicit neural representation implementations.

Several (implicit) decoder architectures, ablations and latent embedding inference methods are implemented.

Quickstart

git clone https://github.com/janzuiderveld/continuous-audio-representations
cd continuous-audio-representations

pip3 install -r requirements.txt

Running Train.py automatically downloads datasets and sets them in place.

Usage:

The Following architectures are readily available (these can be trained by supplying these tags as --architecture argument for Train.py):

  • wavegan (TCNN)
  • im-net
  • pi-gan (PCINR)
    • pi-gan_prog
    • pi-gan_sine_first
    • pi-gan_sine_last
    • pi-gan_relu
    • pi-gan_concat_middle
    • pi-gan_concat_all
    • pi-gan_min_mapping
    • pi-gan_five_mapping
    • pi-gan_shrinking
    • pi-gan_deep
    • pi-gan_wide (PCINR Wide)

Latent embedding inference methdods (--meta_architecture):

  • autoencoder
  • autodecoder

For datasets, the following are automatically downloaded when supplied as --dataset_name:

  • SPEECHCOMMANDS
  • NSYNTH.diverse_baseline
  • NSYNTH.keyboard_baseline

Alternatively;

  • "Path/to/your/folder/containing/.wav/files"

Extensive list of parameters:

long default help
--help show this help message and exit
--wandb 0 Enable wandb logging
--wandb_project_name default Name of wandb project
--dataset_name SPEECHCOMMANDS Which dataset to train on.
--dataset_size 128 Number of samples to train on. Maximum is 1024 in given datasets
--audio_length 16000 Audio length
--autoconfig 0 Enable autoconfig. Overrides omega_0 values depending on dataset and architecture for tested setups.
--lr 1e-05 Learning rate
--batch_size 32 Batch_size
--num_epochs 5001 Number of epochs
--use_gpu 1 Enable GPU
--use_multi_gpu 0 Enable multiple GPUs
--architecture pi-gan What architecture to use as the decoder.
--meta_architecture autodecoder What latent embedding inference method to use.
--num_latent 256 Number of latent dimensions
--double 0 Enable double precision throughout training
--weight_norm 0 Enable weight norm
--first_omega_0 155 First layer input scaling for sinusoidal architectures
--hidden_omega_0 390 Hidden layer input scaling for sinusoidal architectures
--coord_multi 1 Input scaling for any architecture
--latent_init_std 0.001 Latent embedding initialization std
--latent_descent_steps 1 Number of gradient descent steps per iteration for latent embedding optimization
--latent_lr 0.3 Learning rate for latent optimization.
--samples_per_datapoint 8000 Number of samples per
--sample_even 1 Sample coordinates with equal spacing.
--per_sample 1 MSE per sample multiplier for objective function.
--deriv_per_sample 0 MSE per sample of derivative of functions multiplier for objective function.
--cdpam 1 CDPAM multiplier for objective function
--multiscale_STFT 1 Multi STFT multiplier for objective function
--weight_decay 1 L2 weight decay amount.
--eval_every 500 Evaluate every n iterations
--save_audio_plots 1 Save audio plots at every evaluation
--save_latents 1 Save latent embeddings
--save_audio 1 Save generated audio at end of training.
--save_model 1 Save model at end of training.
--eval_samples 1 Number of samples to evaluate on.
--eval_upscale_ratio 1 Upscale ratio for evaluation of generations.
--save_path auto Path to save output. 'auto' creates directories based on setup.
--max_high_res_batch_size 16 Maximum batch size for high resolution evaluations.
--note_general default Note to add to general output directory name
--note default Note to filter wandb results.
--num_groups 0 Number of groups to use for progressive activation scaling in pi-gan_prog
--prog_weight_decay_factor 0 Weight decay reduction factor for progressive weight decay.
--prog_weight_decay_every 0 Number of iterations after which to reduce weight decay.

About

Official PyTorch implementation for "Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations".

https://arxiv.org/abs/2111.08462


Languages

Language:Python 100.0%