Audio Style Transfer
This is a functional implementation of artistic style tranfer algorithm for audio, which uses convolutions with random weights to represent audio features.
Dependencies
- python (tested with 2.7)
- Theano with Lasagne (installation instructions)
- librosa
pip install librosa
- numpy and matplotlib
The easiest way to install python is to use Anaconda.
How to run
- Open
audio_style_transfer.ipynb
in jupyter notebook. - In case you want to use your own audio files as inputs, first cut them to 10s length with:
ffmpeg -i yourfile.mp3 -ss 00:00:00 -t 10 yourfile_10s.mp3
- Set
CONTENT_FILENAME
andSTYLE_FILENAME
in the third cell of ipython notebook to your input files. - Run all cells.
The most frequent problem is domination of either content or style in the output. To fight this problem, adjust ALPHA parameter. Bigger ALPHA means more content in the output, and ALPHA=0 means no content, which reduces stylization to texture generation. Example output outputs/imperial_usa.wav
, the result of mixing content of imperial march from star wars with style of U.S. National Anthem, was obtained with default value ALPHA=1e-2.
References
-
Original paper on style tranfer: A Neural Algorithm of Artistic Style
-
Publications on texture generation with random convolutions:
-
Texture Synthesis Using Shallow Convolutional Networks with Random Filters
-
A Powerful Generative Model Using Random Weights for the Deep Image Representation