Training models with output activation

Question

Training models with output activation

dibgerge opened this issue 2 years ago · comments

❓ Questions on how to use PyTorchVideo

Some models, like X3D, has an output activation in their head, before the global pooling layer:

from pytorchvideo.models.hub import x3d_m

model = x3d_m()
model.blocks[-1]

#ResNetBasicHead(
#  (pool): ProjectedPool(
#   (pre_conv): Conv3d(192, 432, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
#    (pre_norm): BatchNorm3d(432, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#    (pre_act): ReLU()
#    (pool): AvgPool3d(kernel_size=(16, 7, 7), stride=1, padding=0)
#    (post_conv): Conv3d(432, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
#    (post_act): ReLU()
#  )
#  (dropout): Dropout(p=0.5, inplace=False)
#  (proj): Linear(in_features=2048, out_features=400, bias=True)
#  (activation): Softmax(dim=1)
#  (output_pool): AdaptiveAvgPool3d(output_size=1)
#)

you can see the Softmax layer there.

I am working on finetuning a model, and this is confusing me a little bit.

why is softmax before the global pooling, and does it matter if I remove it, so I can use directly cross_entropy loss which expects logits when training?
In the training configs under pytorchvideo_trainer/conf/classification_x3d_xs.yaml, the loss function is defined as cross_entropy, however that expects raw value before Softmax, why is it like this?

Haoqi Fan · Answer 1 · Sat Aug 13 2022 03:27:41 GMT+0800 (China Standard Time)

Thanks for playing with PTV, the order is for multilabel for AVA where we do sigmoid then average.
For the current case, I have set the default action to None so there won't be softmax before pooling.