Softmax missing from Torchvision models
saad-palapa opened this issue · comments
Describe the bug
I'm training an image classifier with Ludwig's TorchVision models.
The original models have a softmax operator in the last layer but they are removed because it doesn't belong in the encoder. However, the softmax layer is never put back in the decoder. Is this done intentionally?
I need to calculate the softmax of the output. There are 3 ways I can do this going forward:
- Add the softmax layer to the decoder
- Add the softmax layer when exporting the model to Torchscript, ONNX, or CoreML
- Leave things as is and calculate the softmax in the application
Here is the debug print statement of the model architecture. I removed most of it for conciseness.
ECD(
(input_features): LudwigFeatureDict(
(module_dict): ModuleDict(
(image_path__ludwig): ImageInputFeature(
(encoder_obj): TVEfficientNetEncoder(
(model): EfficientNet(
(features): Sequential(
(0): Conv2dNormActivation(
(0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(2): SiLU(inplace=True)
)
// --- removed for conciseness ---
(7): Conv2dNormActivation(
(0): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
(2): SiLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=1)
(classifier): Sequential(
(0): Dropout(p=0.2, inplace=True)
(1): Identity()
)
)
)
)
)
)
(output_features): LudwigFeatureDict(
(module_dict): ModuleDict(
(label__ludwig): CategoryOutputFeature(
(fc_stack): FCStack(
(stack): ModuleList()
)
(reduce_sequence_input): SequenceReducer(
(_reduce_obj): ReduceSum()
)
(decoder_obj): Classifier(
(dense): Dense(
(dense): Linear(in_features=1280, out_features=4, bias=True)
)
)
(train_loss_function): SoftmaxCrossEntropyLoss(
(loss_fn): CrossEntropyLoss()
)
)
)
)
(combiner): ConcatCombiner(
(fc_stack): FCStack(
(stack): ModuleList()
)
)
)
To Reproduce
Python file:
import logging
from ludwig.api import LudwigModel
CONFIG = "/auto-ml/ludwig.yaml"
def train_classifier_ludwig(df, save_dir, model_name):
model = LudwigModel(CONFIG, logging_level=logging.INFO)
model.train(
dataset=df,
output_directory=save_dir,
experiment_name="ludwig",
model_name=model_name,
skip_save_processed_input=True,
)
YAML file:
trainer:
epochs: 100
early_stop: 10
use_mixed_precision: false
input_features:
- name: image_path
type: image
preprocessing:
num_processes: 4
encoder:
type: efficientnet
use_pretrained: True
trainable: True
model_cache_dir: null
model_variant: v2_m
fc_layers:
- output_size: 128
dropout: 0.4
output_features:
- name: label
type: category
Expected behavior
When inferencing on an image classifier, the output probabilities should add to 1.
Example values I'm getting from an image classifier with 4 classes:
[-1.0383801 -1.1289184 3.9636617 -0.988309 ]
However, it should be:
[0.00659277 0.0060221 0.98045385 0.00693128]
Environment:
- OS: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.2
- Python 3.10.9
- Ludwig version: latest from master, sha=890f261fa947ed9485065844fe1bd5a35460f6f4
Additional context
I'm not sure if this is related, but there is a SoftmaxCrossEntropyLoss module but it has no softmax operator in it. Is that intentional? Am I missing something here?
Hey @saad-palapa . After the encoder there's a combiner and then the decoder. The decoder thakes care of adding a last projection layer with a coftmax (if it is a category decoder), or to do anything needed for producing predictions.
Tagging @jimthompson5802 too.
I'm not seeing the softmax in the debug output I posted.
Take a look at the yaml file. Is there something that is incorrect there?
The softax is actually applied in the prediction module: _CategoryPredict.
https://github.com/ludwig-ai/ludwig/blob/master/ludwig/features/category_feature.py#L100-L135
Whis is itself not part of the ECD model itself, you are right.
THe reson is that at training time softmax is not needed because the loss applies it, and at prediction, this model is used which also determines if to use calibration or not.