Switch to CrossEntropyLoss?

Question

Switch to CrossEntropyLoss?

kylebgorman opened this issue a year ago · comments

Right now we use NLLLoss, with a complicated custom version thereof when applying label smoothing, and then apply LogSoftmax.

However, CrossEntropyLoss supposedly merges these two steps, and it also supports built-in label smoothing. Moving to it, I suspect, would give us a small speedup. My first attempt to do this was not successful, however: loss plateaued quickly at zero accuracy.

Note that the transducer also has special-casing here.

@Adamits for discussion.

Adam · Answer 1 · Sat Jul 01 2023 00:50:15 GMT+0800 (China Standard Time)

@kylebgorman

I tested this and found that we are not currently passing label smoothing through the CLI :(. I think we probably want to pass **kwargs to model_cls in (here)[https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L301].
I fixed that and then just commented out the log_softmax in AttentiveLSTMEncoderDecoder:

output = self.classifier(output)
# output = self.log_softmax(output)
return output, hiddens

And then updated the loss function in BaseEncoderDecoder:

def _get_loss_func(
       self, reduction: str
) -> Callable[[torch.Tensor, torch.Tensor], torch.Tensor]:
        """Returns the actual function used to compute loss.

        Args:
            reduction (str): reduction for the loss function (e.g., "mean").

        Returns:
            Callable[[torch.Tensor, torch.Tensor], torch.Tensor]: configured
                loss function.
        """
        return nn.CrossEntropyLoss(
            ignore_index=self.pad_idx,
            reduction=reduction,
            label_smoothing=self.label_smoothing,
        )

I also quickly on pen and paper convinced myself that this is equivalent to what we did before.As far as I can tell it works, but maybe I should run it against some benchmark, and of course need to double check wrt Transducer.

Kyle Gorman · Answer 2 · Sat Jul 01 2023 00:51:07 GMT+0800 (China Standard Time)

I can test this quickly, thanks.

Adam · Answer 3 · Sat Jul 01 2023 01:48:02 GMT+0800 (China Standard Time)

Sounds good! LMK if you want me to run any tests, or make the changes. I'll leave it to you otherwise.

Kyle Gorman · Answer 4 · Tue Aug 15 2023 05:28:48 GMT+0800 (China Standard Time)

Updating with notes from offline discussion: current plan is to retain NLLLoss (and the smoothed variant, "le sigh"), but only for pointer-generator under its _get_loss_function, and then to configure CrossEntropyLoss (which automatically supports label smoothing) in the base classes's _get_loss_function`.

Kyle Gorman · Answer 5 · Wed Aug 16 2023 07:06:35 GMT+0800 (China Standard Time)

Closed in #133.