Inconsistency with paper
CookiePPP opened this issue · comments
https://arxiv.org/pdf/2106.02297.pdf
In section 2.3
"After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers"
Fre-GAN-pytorch/discriminator.py
Lines 242 to 246 in 91d0e46
You are concatenating on the length dim resulting in an odd looking tensor where the first half is audio features and the 2nd half is DWT features, and local waveform/DWT information can't mix properly.
Is there any reason for this? I feel very confused looking at this, but you've done it twice so I assume there's some reason for this.
Hey @CookiePPP, For my understanding the sentence "After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers" should refer to the lines:
Fre-GAN-pytorch/discriminator.py
Lines 228 to 235 in 91d0e46
@leminhnguyen
Thanks!
Yes I see. I wonder if "and passed to convolutional layers" would have meant channel-wise concat or length wise. 🤔
I suppose the difference shouldn't be large in terms of quality, just maybe an increase in compute/training time from having the discriminator latents get longer after every layer.