rishikksh20 / Fre-GAN-pytorch

https://arxiv.org/pdf/2106.02297.pdf

In section 2.3
"After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers"

Fre-GAN-pytorch/discriminator.py

Lines 242 to 246 in 91d0e46

    
           if i == 0: 
        
               x = torch.cat([x, x_d1], dim=2) 
        
           if i == 1: 
        
               x = torch.cat([x, x_d2], dim=2) 
        
           i = i + 1

You are concatenating on the length dim resulting in an odd looking tensor where the first half is audio features and the 2nd half is DWT features, and local waveform/DWT information can't mix properly.

Is there any reason for this? I feel very confused looking at this, but you've done it twice so I assume there's some reason for this.

Hey @CookiePPP, For my understanding the sentence "After each level of DWT, all the frequency sub-bands are channel-wise concatenated and passed to convolutional layers" should refer to the lines:

Fre-GAN-pytorch/discriminator.py

Lines 228 to 235 in 91d0e46

    
           # DWT 1 
        
           x_d1_high1, x_d1_low1 = self.dwt1d(x) 
        
           x_d1 = self.dwt_conv1(torch.cat([x_d1_high1, x_d1_low1], dim=1)) 
        
           # DWT 2 
        
           x_d2_high1, x_d2_low1 = self.dwt1d(x_d1_high1) 
        
           x_d2_high2, x_d2_low2 = self.dwt1d(x_d1_low1) 
        
           x_d2 = self.dwt_conv2(torch.cat([x_d2_high1, x_d2_low1, x_d2_high2, x_d2_low2], dim=1))

@leminhnguyen
Thanks!
Yes I see. I wonder if "and passed to convolutional layers" would have meant channel-wise concat or length wise. 🤔

I suppose the difference shouldn't be large in terms of quality, just maybe an increase in compute/training time from having the discriminator latents get longer after every layer.

	if i == 0:
	x = torch.cat([x, x_d1], dim=2)
	if i == 1:
	x = torch.cat([x, x_d2], dim=2)
	i = i + 1

	# DWT 1
	x_d1_high1, x_d1_low1 = self.dwt1d(x)
	x_d1 = self.dwt_conv1(torch.cat([x_d1_high1, x_d1_low1], dim=1))

	# DWT 2
	x_d2_high1, x_d2_low1 = self.dwt1d(x_d1_high1)
	x_d2_high2, x_d2_low2 = self.dwt1d(x_d1_low1)
	x_d2 = self.dwt_conv2(torch.cat([x_d2_high1, x_d2_low1, x_d2_high2, x_d2_low2], dim=1))

Inconsistency with paper