Gradient disappears in CIFAR10

Question

Gradient disappears in CIFAR10

huanranchen opened this issue 2 years ago · comments

Hi~ When training on cifar10, I discovered that even in the first iteration, the gradient of my model disappears. But if I fine tune pretrained ConvNeXt on cifar10, the gradient is normal.

I use convnext in pytorch, with learning_rate 4e-3, AdamW optimizer, batch_size 64.

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.model = models.convnext_tiny(pretrained=False)

        self.classifier = nn.Sequential(
            nn.Flatten(start_dim=1, end_dim=-1),
            nn.LayerNorm([768], eps=1e-6, elementwise_affine=True),
            nn.Linear(768, 60)
        )

    def forward(self, x):
        x=self.model.features(x)
        x = self.model.avgpool(x)
        x = self.classifier(x)

        return x

This problem is not only in cifar10, but also in NICO dataset where the input image size is 224x224. I tried to resize the input image to 448x448, but the gradient is still abnormal.

So, why the gradient will disappear~

Huanran Chen · Answer 1 · Fri May 06 2022 12:35:56 GMT+0800 (China Standard Time)

I don't know how I solved this problem. I use same code, same environment, and managed to train on cifar10 today.....