chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Last layer of the generator in the CNN (size 16384)

spagliarini opened this issue · comments

Hi!

I would like to clean up a doubt I have.
Is the fact that the last layer of the generator has dimension 16384 related to CNN constraints or is it important that the dimension is a bit more than one would like to obtain? Here, 1 s.

Thank you in advance!!!

Ah, good question. The last layer of the generator has dimension 16384 simply because it is a power of four; each of the five layers of the generator increases the number of timesteps by a factor of four, starting from 16 (arbitrary choice).

This output length could represent any amount of time depending on sampling rate, but 16kHz is a common sampling rate in speech processing and conveniently works out to around one second of generated audio (our goal), so we went with that.