Time Domain Neural Audio Style Transfer

NIPS2017 "Time Domain Neural Audio Style Transfer" code repository
https://arxiv.org/abs/1711.11160
Parag K. Mital

Presented at https://nips2017creativity.github.io

Colab: https://colab.research.google.com/drive/18_DckMGM-fsSthIqZI9sOAlY6I3plDLD

Introduction

A recently published method for audio style transfer has shown how to extend the process of image style transfer to audio. This method synthesizes audio "content" and "style" independently using the magnitudes of a short time Fourier transform, shallow convolutional networks with randomly initialized filters, and iterative phase reconstruction with Griffin-Lim. In this work, we explore whether it is possible to directly optimize a time domain audio signal, removing the process of phase reconstruction and opening up possibilities for real-time applications and higher quality syntheses. We explore a variety of style transfer processes on neural networks that operate directly on time domain audio signals and demonstrate one such network capable of audio stylization.

Installation

Python 3.4+ required (Magenta is required for NSynth and WaveNet models only; but I was unable to stylize audio using these models).

Code

The models folder shows three different modules, timedomain shows how to combine the different input features described in the paper, including real, imaginary, and magnitudes and phases of a discrete timedomain transform for performing time-domain audio style transfer. Have a look at the input_features argument for specifying different input features to use for the time domain style transfer algorithm.

The uylanov module includes the approach by Ulyanov et al.

Finally, the nsynth module includes the NSynth Autoencoder, showing how to use the encoder or the decoder as approaches to audio stylization, though I was unable to perform any successful stylization using this approach.

Usage

You can use any of the modules in the models folder, timedomain, uylanov, or nsynth from the command line like so:

python models/timedomain.py
usage: timedomain.py [-h] -s STYLE -c CONTENT -o OUTPUT [-m MODE]

These take paths to the style or content files or paths (when mode is 'batch'), e.g.:

python models/timedomain.py -s /path/to/style.wav -c /path/to/content.wav -o /path/to/output.wav

or:

python models/timedomain.py -s /path/to/style/wavs/folder -c /path/to/content/wavs/folder -o /path/to/output/wavs/folder -m batch

Audio Samples

This repository also includes audio samples from Robert Thomas (target/male-talking.wav), music by Robert Thomas and Franky Redente (corpus/robthomas*), samples and music by John Tejada and Reggie Watts (corpus/johntejada*, target/beat-box*, target/male-singing.wav, target/female-singing.wav), and one voice sample by Ashwin Vaswani (target/male-taal.wav). These clips were generously contributed to this work by their authors and are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. That means these clips are not for commercial usage. Further, any sharing of these clips must contain attribution to their authors, and must be shared under the same license.

Example Outputs

The folder examples includes syntheses using the models/timedomain module and the original Ulyanov network in models/uylanov, and were created using the script in the root of the repo, search.py. `

nissy-shota / time-domain-neural-audio-style-transfer