Vit MAE reconstruction size mismatch
RhinigtasSalvex opened this issue · comments
I'm trying to Train ViT with Masked Autoencoder training but I'm getting an error when running MAE.forward()
The tensor size of the predicted pixel values is of by a factor of 4 in comparison to the masked_patches tensor in the MSE_loss call.
RuntimeError: The size of tensor a (1024) must match the size of tensor b (4096) at non-singleton dimension 2
I've tried different settings but the factor 4 size mismatch stays.
I've also tried a hack to fix the predicted pixel values size by adding a factor 4 to the to_pixels output layer neuron count.
This fixes the problem in the MSE_loss call but introduces a new one, namely: The gradients don't match up in the backward call.
RuntimeError: Function MmBackward returned an invalid gradient at index 1 - got [4096, 1024] but expected shape compatible with [1024, 1024]
But now I don't know how to debug further.
my last settings where:
'model': {
'encoder_depth': 5,
'decoder_depth': 5,
'patch_size': 32,
'num_classes': 1000,
'channels': 1,
'dim': 1024,
'heads': 8,
'mlp_dim': 2048,
'masking_ratio': 0.75,
'decoder_dim': 512,
},
Hi Rhinigtas! Could you show what your full training script looks like? Perhaps I can spot the error more easily that way
Hi Lucidrains, I've uploaded a stripped down version of my training script.