HHTseng / video-classification

@HHTseng Hi! Thanks for this repo. I noticed you did not do resnet.eval() in your encoder for frame feature extraction. Since we are freezing the weights of the resnet model, does it help to use the pretrained batch norm params while normalizing with current batch stats?

video-classification/ResNetCRNN/functions.py

Lines 325 to 341 in 99ebf20

    
           def forward(self, x_3d): 
        
               cnn_embed_seq = [] 
        
               for t in range(x_3d.size(1)): 
        
                   # ResNet CNN 
        
                   with torch.no_grad(): 
        
                       x = self.resnet(x_3d[:, t, :, :, :])  # ResNet 
        
                       x = x.view(x.size(0), -1)             # flatten output of conv 
        
                   # FC layers 
        
                   x = self.bn1(self.fc1(x)) 
        
                   x = F.relu(x) 
        
                   x = self.bn2(self.fc2(x)) 
        
                   x = F.relu(x) 
        
                   x = F.dropout(x, p=self.drop_p, training=self.training) 
        
                   x = self.fc3(x) 
        
                   cnn_embed_seq.append(x)

The parameters of batch norm are not updated. However, running stats will be updated.

	def forward(self, x_3d):
	cnn_embed_seq = []
	for t in range(x_3d.size(1)):
	# ResNet CNN
	with torch.no_grad():
	x = self.resnet(x_3d[:, t, :, :, :]) # ResNet
	x = x.view(x.size(0), -1) # flatten output of conv

	# FC layers
	x = self.bn1(self.fc1(x))
	x = F.relu(x)
	x = self.bn2(self.fc2(x))
	x = F.relu(x)
	x = F.dropout(x, p=self.drop_p, training=self.training)
	x = self.fc3(x)

	cnn_embed_seq.append(x)

no resnet.eval() in encoder ?