CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Switch to CrossEntropyLoss?

kylebgorman opened this issue · comments

Right now we use NLLLoss, with a complicated custom version thereof when applying label smoothing, and then apply LogSoftmax.

However, CrossEntropyLoss supposedly merges these two steps, and it also supports built-in label smoothing. Moving to it, I suspect, would give us a small speedup. My first attempt to do this was not successful, however: loss plateaued quickly at zero accuracy.

Note that the transducer also has special-casing here.

@Adamits for discussion.

commented

@kylebgorman

  1. I tested this and found that we are not currently passing label smoothing through the CLI :(. I think we probably want to pass **kwargs to model_cls in (here)[https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L301].

  2. I fixed that and then just commented out the log_softmax in AttentiveLSTMEncoderDecoder:

output = self.classifier(output)
# output = self.log_softmax(output)
return output, hiddens

And then updated the loss function in BaseEncoderDecoder:

def _get_loss_func(
       self, reduction: str
) -> Callable[[torch.Tensor, torch.Tensor], torch.Tensor]:
        """Returns the actual function used to compute loss.

        Args:
            reduction (str): reduction for the loss function (e.g., "mean").

        Returns:
            Callable[[torch.Tensor, torch.Tensor], torch.Tensor]: configured
                loss function.
        """
        return nn.CrossEntropyLoss(
            ignore_index=self.pad_idx,
            reduction=reduction,
            label_smoothing=self.label_smoothing,
        )

I also quickly on pen and paper convinced myself that this is equivalent to what we did before.As far as I can tell it works, but maybe I should run it against some benchmark, and of course need to double check wrt Transducer.

I can test this quickly, thanks.

commented

Sounds good! LMK if you want me to run any tests, or make the changes. I'll leave it to you otherwise.

Updating with notes from offline discussion: current plan is to retain NLLLoss (and the smoothed variant, "le sigh"), but only for pointer-generator under its _get_loss_function, and then to configure CrossEntropyLoss (which automatically supports label smoothing) in the base classes's _get_loss_function`.

Closed in #133.