Training models with output activation
dibgerge opened this issue · comments
❓ Questions on how to use PyTorchVideo
Some models, like X3D, has an output activation in their head, before the global pooling layer:
from pytorchvideo.models.hub import x3d_m
model = x3d_m()
model.blocks[-1]
#ResNetBasicHead(
# (pool): ProjectedPool(
# (pre_conv): Conv3d(192, 432, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
# (pre_norm): BatchNorm3d(432, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
# (pre_act): ReLU()
# (pool): AvgPool3d(kernel_size=(16, 7, 7), stride=1, padding=0)
# (post_conv): Conv3d(432, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
# (post_act): ReLU()
# )
# (dropout): Dropout(p=0.5, inplace=False)
# (proj): Linear(in_features=2048, out_features=400, bias=True)
# (activation): Softmax(dim=1)
# (output_pool): AdaptiveAvgPool3d(output_size=1)
#)
you can see the Softmax layer there.
I am working on finetuning a model, and this is confusing me a little bit.
- why is softmax before the global pooling, and does it matter if I remove it, so I can use directly
cross_entropy
loss which expects logits when training? - In the training configs under
pytorchvideo_trainer/conf/classification_x3d_xs.yaml
, the loss function is defined ascross_entropy
, however that expects raw value before Softmax, why is it like this?
Thanks for playing with PTV, the order is for multilabel for AVA where we do sigmoid then average.
For the current case, I have set the default action to None so there won't be softmax before pooling.