f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.

Home Page:https://backpack.pt/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for LayerNorm

Niccolo-Ajroldi opened this issue · comments

I was trying to extend a Vision Transformer model using backpack. However, I encounter the following error:

UserWarning: Extension saving to grad_batch does not have an extension for Module <class 'torch.nn.modules.normalization.LayerNorm'> although the module has parameters

I know that torch.nn.BatchNormNd leads to ill-defined first-order quantities and hence it is not implemented here. Does the same hold for Layer Normalization?

Thank you in advance!

Hi,

thanks for your question. The exception you get for LayerNorm is because BackPACK currently does not support it.

In contrast to BatchNorm however, this layer treats each sample in a mini-batch independently (mean and variance for the normalization for a sample are computed along its feature dimensions; for BN they are computed along the batch dimension). Hence, first-order quantities like individual gradients are defined.

To add support for LayerNorm, the following example from the documentation is a good starting point. It describes how to write BackPACK extensions for new layers (the "Custom module extension" is the most relevant).

I'd be happy to help merging a PR.

Best,
Felix

Any update on this?

No progress, and I don't have capacities to work on this feature.

To break things further down, adding limited support for LayerNorm, e.g. only the BatchGrad extension, would be a feasible starting point. This can be achieved by following the above example in the docs.