ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

Home Page:http://ludwig.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Softmax missing from Torchvision models

saad-palapa opened this issue · comments

Describe the bug

I'm training an image classifier with Ludwig's TorchVision models.

The original models have a softmax operator in the last layer but they are removed because it doesn't belong in the encoder. However, the softmax layer is never put back in the decoder. Is this done intentionally?

I need to calculate the softmax of the output. There are 3 ways I can do this going forward:

  • Add the softmax layer to the decoder
  • Add the softmax layer when exporting the model to Torchscript, ONNX, or CoreML
  • Leave things as is and calculate the softmax in the application

Here is the debug print statement of the model architecture. I removed most of it for conciseness.

ECD(
  (input_features): LudwigFeatureDict(
    (module_dict): ModuleDict(
      (image_path__ludwig): ImageInputFeature(
        (encoder_obj): TVEfficientNetEncoder(
          (model): EfficientNet(
            (features): Sequential(
              (0): Conv2dNormActivation(
                (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
                (1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
                (2): SiLU(inplace=True)
              )

              // --- removed for conciseness ---

              (7): Conv2dNormActivation(
                (0): Conv2d(256, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (1): BatchNorm2d(1280, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
                (2): SiLU(inplace=True)
              )
            )
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (classifier): Sequential(
              (0): Dropout(p=0.2, inplace=True)
              (1): Identity()
            )
          )
        )
      )
    )
  )
  (output_features): LudwigFeatureDict(
    (module_dict): ModuleDict(
      (label__ludwig): CategoryOutputFeature(
        (fc_stack): FCStack(
          (stack): ModuleList()
        )
        (reduce_sequence_input): SequenceReducer(
          (_reduce_obj): ReduceSum()
        )
        (decoder_obj): Classifier(
          (dense): Dense(
            (dense): Linear(in_features=1280, out_features=4, bias=True)
          )
        )
        (train_loss_function): SoftmaxCrossEntropyLoss(
          (loss_fn): CrossEntropyLoss()
        )
      )
    )
  )
  (combiner): ConcatCombiner(
    (fc_stack): FCStack(
      (stack): ModuleList()
    )
  )
)

To Reproduce

Python file:

import logging

from ludwig.api import LudwigModel

CONFIG = "/auto-ml/ludwig.yaml"

def train_classifier_ludwig(df, save_dir, model_name):
    model = LudwigModel(CONFIG, logging_level=logging.INFO)

    model.train(
        dataset=df,
        output_directory=save_dir,
        experiment_name="ludwig",
        model_name=model_name,
        skip_save_processed_input=True,
    )

YAML file:

trainer:
  epochs: 100
  early_stop: 10
  use_mixed_precision: false

input_features:
  - name: image_path
    type: image
    preprocessing:
      num_processes: 4
    encoder:
      type: efficientnet
      use_pretrained: True
      trainable: True
      model_cache_dir: null
      model_variant: v2_m
    fc_layers:
      - output_size: 128
        dropout: 0.4

output_features:
  - name: label
    type: category

Expected behavior

When inferencing on an image classifier, the output probabilities should add to 1.

Example values I'm getting from an image classifier with 4 classes:

[-1.0383801 -1.1289184  3.9636617 -0.988309 ]

However, it should be:

[0.00659277 0.0060221  0.98045385 0.00693128]

Environment:

  • OS: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.2
  • Python 3.10.9
  • Ludwig version: latest from master, sha=890f261fa947ed9485065844fe1bd5a35460f6f4

Additional context

I'm not sure if this is related, but there is a SoftmaxCrossEntropyLoss module but it has no softmax operator in it. Is that intentional? Am I missing something here?

@skanjila, @ethanreidel, @arnavgarg1

Hey @saad-palapa . After the encoder there's a combiner and then the decoder. The decoder thakes care of adding a last projection layer with a coftmax (if it is a category decoder), or to do anything needed for producing predictions.

Tagging @jimthompson5802 too.

I'm not seeing the softmax in the debug output I posted.

Take a look at the yaml file. Is there something that is incorrect there?

The softax is actually applied in the prediction module: _CategoryPredict.

https://github.com/ludwig-ai/ludwig/blob/master/ludwig/features/category_feature.py#L100-L135

Whis is itself not part of the ECD model itself, you are right.

THe reson is that at training time softmax is not needed because the loss applies it, and at prediction, this model is used which also determines if to use calibration or not.