facebookresearch / pytorchvideo

A deep learning library for video understanding research.

Home Page:https://pytorchvideo.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training models with output activation

dibgerge opened this issue · comments

❓ Questions on how to use PyTorchVideo

Some models, like X3D, has an output activation in their head, before the global pooling layer:

from pytorchvideo.models.hub import x3d_m

model = x3d_m()
model.blocks[-1]

#ResNetBasicHead(
#  (pool): ProjectedPool(
#   (pre_conv): Conv3d(192, 432, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
#    (pre_norm): BatchNorm3d(432, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#    (pre_act): ReLU()
#    (pool): AvgPool3d(kernel_size=(16, 7, 7), stride=1, padding=0)
#    (post_conv): Conv3d(432, 2048, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
#    (post_act): ReLU()
#  )
#  (dropout): Dropout(p=0.5, inplace=False)
#  (proj): Linear(in_features=2048, out_features=400, bias=True)
#  (activation): Softmax(dim=1)
#  (output_pool): AdaptiveAvgPool3d(output_size=1)
#)

you can see the Softmax layer there.

I am working on finetuning a model, and this is confusing me a little bit.

  • why is softmax before the global pooling, and does it matter if I remove it, so I can use directly cross_entropy loss which expects logits when training?
  • In the training configs under pytorchvideo_trainer/conf/classification_x3d_xs.yaml, the loss function is defined as cross_entropy, however that expects raw value before Softmax, why is it like this?

Thanks for playing with PTV, the order is for multilabel for AVA where we do sigmoid then average.
For the current case, I have set the default action to None so there won't be softmax before pooling.