To continue test wav?

Question

To continue test wav?

lukezos opened this issue 7 years ago · comments

First: your work is absolutely awesome! The fact that model is capable to generate reasonably sounding signal for many seconds (and hundreds of thousands samples) is awesome.

Second: my question:
I have trained models on music dataset successfully. I would like to see (hear) continuation of given wav file generated by the model. Basically to find out how well and how long model is able to continue input sound.
Please give me a hint how to do that,
thanks!
Lukas

CJ Carr · Answer 1 · Tue Mar 28 2017 04:57:25 GMT+0800 (China Standard Time)

Change N_SECS=5 to something longer in two_tier.py or three_tier.py

lukezos · Answer 2 · Tue Mar 28 2017 05:10:10 GMT+0800 (China Standard Time)

Thank you, however I think this will just generate longer sequence initialised by (from two_tier.py):

First half zero, others fixed random at each checkpoint

h0 = numpy.zeros(
        (N_SEQS-fixed_rand_h0.shape[0], N_RNN, H0_MULT*DIM),
        dtype='float32'
)
h0 = numpy.concatenate((h0, fixed_rand_h0), axis=0)

My point is to continue "real" (i.e. from test/validate/train npy file) sequence instead and see how model is able to continue current note, tempo, etc. (for music database).
Should I just replace the above initialisation with feeding of "real" sequence?

thanks,
Lukas

Kundan Kumar · Answer 3 · Tue Mar 28 2017 08:00:05 GMT+0800 (China Standard Time)

Basically, you'll use your audio to compute hidden states of the RNN and then you'll use them as initial hidden state when you start generating.

This would amount to inserting a loop like https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L685 before this loop, but samples here will contain the audio that you have (with all preprocessing) but you'll not have this line(https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/three_tier/three_tier.py#L702) i.e. you'll not be updating your seeded audio but only get the updated hidden states.

Then, you can use the generation loop with the new hidden states to generate audio.

Alternatively, you can concatenate your audio before zeros in the samples array but when running the generation loop you will not be updating samples array for timesteps which correspond to the seeded audio.

lukezos · Answer 4 · Thu Mar 30 2017 05:16:05 GMT+0800 (China Standard Time)

Hi!

Thank you!
I went with the second option: "Alternatively, you can concatenate your audio before zeros in the samples array but when running the generation loop you will not be updating samples array for timesteps which correspond to the seeded audio."

For best results what should be the length of seeded audio, with default running parameters for for three_tier and two_tier models?

Kundan Kumar · Answer 5 · Thu Mar 30 2017 06:59:28 GMT+0800 (China Standard Time)

The longer, the better. In my opinion having around 1-2 seconds should be sufficient to capture the texture of audio, however, it depends on many other things, like kind of data on which model was originally trained on, etc.