huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Home Page:https://huggingface.co/docs/timm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE] Feature only support for Twins-PVT and Mvitv2

L-Reichardt opened this issue · comments

commented

Both are pyramid networks and can be used for multi-scale feature extraction, but to my knowledge do not support it like similar architectures such as PVT or Swin.

@L-Reichardt the efficient mechanism for feature extraction relies on sequential stack at the stage level of the pyramid network, many pure vit / vit-hybrid need nn.ModuleList (and have extra args) or have extra root level modules in the model that can't be sequentialized... I have a very rough draft for another approach that'd address these but have another project in the way right now...