nv-tlabs / datasetGAN_release

Hi, according to definition of JS divergence (as mentioned in your supp file), JS divergence is calculated as the difference of
entropy of average probabilities and average of entropies.

However in your code, the first term of JS, aka the difference of entropy of average probabilities is implemented as:

datasetGAN_release/datasetGAN/train_interpreter.py

Line 304 in dee6d7d

full_entropy = Categorical(logits=mean_seg).entropy()

where mean_seg is defined as average segmentation map of 10 outputs of ensembled pixel_classifiers.

Specifically, I have traced the implementation of mean_seg -->

datasetGAN_release/datasetGAN/train_interpreter.py

Line 302 in dee6d7d

mean_seg = mean_seg / len(all_seg)

-->

datasetGAN_release/datasetGAN/train_interpreter.py

Lines 291 to 294 in dee6d7d

    
           if mean_seg is None: 
        
               mean_seg = img_seg 
        
           else: 
        
               mean_seg += img_seg

--> img_seg

datasetGAN_release/datasetGAN/train_interpreter.py

Lines 282 to 284 in dee6d7d

    
           img_seg = classifier(affine_layers) 
        
           img_seg = img_seg.squeeze()

In fact, img_seg are all unnormalized probabilities, aka logits defined in pytorch distribution's argument. I think in the code you attempted to do average upon logits instead of probabilies (since you have commented out Sigmoid in pixel_classifier)

datasetGAN_release/datasetGAN/train_interpreter.py

Lines 68 to 92 in dee6d7d

    
           class pixel_classifier(nn.Module): 
        
               def __init__(self, numpy_class, dim): 
        
                   super(pixel_classifier, self).__init__() 
        
                   if numpy_class < 32: 
        
                       self.layers = nn.Sequential( 
        
                           nn.Linear(dim, 128), 
        
                           nn.ReLU(), 
        
                           nn.BatchNorm1d(num_features=128), 
        
                           nn.Linear(128, 32), 
        
                           nn.ReLU(), 
        
                           nn.BatchNorm1d(num_features=32), 
        
                           nn.Linear(32, numpy_class), 
        
                           # nn.Sigmoid() 
        
                       ) 
        
                   else: 
        
                       self.layers = nn.Sequential( 
        
                           nn.Linear(dim, 256), 
        
                           nn.ReLU(), 
        
                           nn.BatchNorm1d(num_features=256), 
        
                           nn.Linear(256, 128), 
        
                           nn.ReLU(), 
        
                           nn.BatchNorm1d(num_features=128), 
        
                           nn.Linear(128, numpy_class), 
        
                           # nn.Sigmoid() 
        
                       )

TL; DR

The unlawful commutation of softmax and linear operation leads to mis-implementation of JS divergence.

Thank you a lot for pointing this out. We are looking into it.

@greatwallet Thank you again for pointing this bug out!
We have fixed the bug in the commit of
d9564d4.
The number is also updated in README.

	if mean_seg is None:
	mean_seg = img_seg
	else:
	mean_seg += img_seg

	img_seg = classifier(affine_layers)

	img_seg = img_seg.squeeze()

	class pixel_classifier(nn.Module):
	def __init__(self, numpy_class, dim):
	super(pixel_classifier, self).__init__()
	if numpy_class < 32:
	self.layers = nn.Sequential(
	nn.Linear(dim, 128),
	nn.ReLU(),
	nn.BatchNorm1d(num_features=128),
	nn.Linear(128, 32),
	nn.ReLU(),
	nn.BatchNorm1d(num_features=32),
	nn.Linear(32, numpy_class),
	# nn.Sigmoid()
	)
	else:
	self.layers = nn.Sequential(
	nn.Linear(dim, 256),
	nn.ReLU(),
	nn.BatchNorm1d(num_features=256),
	nn.Linear(256, 128),
	nn.ReLU(),
	nn.BatchNorm1d(num_features=128),
	nn.Linear(128, numpy_class),
	# nn.Sigmoid()
	)

Mis-inplementation of JS divergence

TL; DR