Last layer of the generator in the CNN (size 16384)

Question

Last layer of the generator in the CNN (size 16384)

spagliarini opened this issue 4 years ago · comments

Hi!

I would like to clean up a doubt I have.
Is the fact that the last layer of the generator has dimension 16384 related to CNN constraints or is it important that the dimension is a bit more than one would like to obtain? Here, 1 s.

Thank you in advance!!!

Chris Donahue · Answer 1 · Thu Mar 18 2021 00:11:13 GMT+0800 (China Standard Time)

Ah, good question. The last layer of the generator has dimension 16384 simply because it is a power of four; each of the five layers of the generator increases the number of timesteps by a factor of four, starting from 16 (arbitrary choice).

This output length could represent any amount of time depending on sampling rate, but 16kHz is a common sampling rate in speech processing and conveniently works out to around one second of generated audio (our goal), so we went with that.