huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Home Page:https://huggingface.co/docs/timm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] reg_token not working for ViT models

Tgaaly opened this issue · comments

Describe the bug
If I try to create a 'vit_base_patch16_384' model, for example, and set the argument: reg_token=4 (to add 3 register tokens to the model, according to this paper: https://arxiv.org/pdf/2309.16588.pdf. I hope I'm not missing something. I understand if this is not a supported feature yet.

the model fails to instantiate, resulting in a size mismatch issue in timm/layers/pos_embed.py line 45 - see below.
image

To Reproduce
Steps to reproduce the behavior:

  1. create a vit_base_patch16_384 and pass in the function argument reg_token=4.

Expected behavior
I would expect the model to be built/instantiated correctly.

@Tgaaly you can't use petrained=True when adding reg tokens to an existing model def, it's changing the model architecture. It would be possible to add extra code to allow, but I hacked it and tried it and there's a pretty big drop in performance so I don't feel that the extra overhead in code/maint is warranted.

The current intent is to allow training/defining new models with reg tokens enabled. There are weights for the dinov2 ones. I have a few smaller vits being trained right now with reg tokens.

Also I feel not using a reg token backbone that hasn't been pretrained with reg tokens would sort of defeat the purpose... it's a fairly fundamental change and you'd want to do the pretrain with them.

ah that is right. makes sense. thank you so much for your responses.