Unisound / SampleRNN

Tensorflow implementation of SampleRNN

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SampleRNN

A Tensorflow implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.

Requirements

  • Tensroflow 1.0
  • Python 2.7
  • Librosa  

Dataset  

We used the pinao music of 74 minutes as the training corpus, and you can use any corpus containing ".wav" files to instead as well.
For Mandarin samples, we used human voice of 6 hours as the training corpus .

Samples

Pretrained model

FEATURES

  • 2-tier SampleRNN
  • 3-tier SampleRNN
  • Quantization in linear.
  • Quantization in mu-law.

Training

python train.py \
	--data_dir=./pinao-corpus \
	--silence_threshold=0.1 \
	--sample_size=102408 \
	--big_frame_size=8 \
	--frame_size=2 \
	--q_levels=256 \
	--rnn_type=GRU \
	--dim=1024 \
	--n_rnn=1 \
	--seq_len=520 \
	--emb_size=256 \
	--batch_size=64 \
	--optimizer=adam \
	--num_gpus=4

or

sh run.sh

Related projects

This work is based on the flowing implementations with some modifications:

About

Tensorflow implementation of SampleRNN

License:GNU Lesser General Public License v3.0


Languages

Language:Python 98.6%Language:Shell 0.8%Language:Batchfile 0.6%