chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Single-frequency noisy sound in the result

RitoRitoni opened this issue · comments

Hi @chrisdonahue ! I really appreciate your work in this paper its really what i needed for my bachelor thesis.
I am working the WaveGan project through tensorflow-gpu-directdml beacuse i didnt have an nvidia gpu.
Everything seemed to work fine till the generation part.
I firstly tried the wavegan algorithm to about 10 samples of the SC09 drums dataset. The result is a signle-frequency/ noisy metallic sound which has nothing to do with drums.
I also had this sound when i tried Wavegan to my own 10sec-wavs dataset of music characteristics. Here is the sound.

noiseSound.zip

-Has anyone idea what is happening?
-How many models should i leave to run to have a decent result?
-Does the dataset need to be over 100 wavs for example or more? Is there a threshold that doesnt allow the Gan to be trained properly?
-Does the input and output wav length has to be the same?

I would appreciate anybody's comments or help very much!
Regards,
Eleftheria

I downloaded your result and it sounds mine.
I have used my own dataset and author's example piano dataset, and I copied training script and generation script in README, however it doesn't work.
Have you find out any solutions?

By the way, I want to know how to generate a WAV FILE. I can only hear the result via jupyter notebook in my browser.

Hello, sorry for the late reply!
The issue was it needed more time training. I managed to get decent results after one night training or sth i dont really remeber because after i was sure it works i start trying it with my own dataset. Make sure you give as much time as you can in the first tries so you can know approxiamately where to stop,also use tensorboard to control

Thanks for your reply. Hereby I write this reply to note another possible solution.

After reading your reply, I spent a whole week on training, still got nothing noise results.
Finally, I replaced my training GPU from 3060 to 1080ti, then it works! In my opinion, incompatibility between RTX 30 series GPU and CUDA 9.0, which is compatible to TensorFlow 1.12, leads to incorrect calculation results, thus comes the noisy sound.

In the end, I have the same question: How to generate a WAV FILE ?

This is what I'm using to generate a WAV file:

import tensorflow as tf

import numpy as np
import PIL.Image
from scipy.io.wavfile import write as wavwrite
from IPython.display import display, Audio

tf.reset_default_graph()
saver = tf.train.import_meta_graph('./train_dir/infer/infer.meta')
graph = tf.get_default_graph()
sess = tf.InteractiveSession()
# First rename checkpoint files (remove the -number)
# model.ckpt-XX.data-00000-of-00001 ----> model.ckpt.data-00000-of-00001
# model.ckpt-XX.index ------------------> model.ckpt.index
# model.ckpt-XX.meta -------------------> model.ckpt.meta
saver.restore(sess, './train_dir/model.ckpt') 

# Create 50 random latent vectors z
_z = (np.random.rand(50, 100) * 2.) - 1

# Synthesize G(z)
z = graph.get_tensor_by_name('z:0')
G_z = graph.get_tensor_by_name('G_z:0')
_G_z = sess.run(G_z, {z: _z})

# Method 1
wavwrite('test-scipy.wav', 16000, _G_z[0, :, 0])

# Method 2
audio = Audio(_G_z[0, :, 0], rate=16000)
with open('test-iphyton.wav', 'wb') as f:
    f.write(audio.data)

Not sure if its the right way to do since I already have not enough trained my model and result is noisy