huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Home Page:https://huggingface.co/docs/timm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pytorch classification vit_model.py Class PatchEmbed forward function features size could have some problems?

XiaoluJiayou opened this issue · comments

hello, your code where vit_model.py class PatchEmbed forward function x.shape i think could have problems.
In PacthEmbed class forward function , your example x.shape should be [B, C, HW] after self.proj(x).flatten(2) .
__**i think after proj(x) should be [B, Embed_dim=768, 16
16(patch_size)] not be [B, C=3, HW (224 224) ]**__
please pay some attention to this issue , Thank you very much!
image

@XiaoluJiayou it's correct, proj is a conv, input = output = BCHW ... flatten BCN, transpose BNC

@XiaoluJiayou it's correct, proj is a conv, input = output = BCHW ... flatten BCN, transpose BNC

Hello, sir this cloud have some problems. I write a code to test the input tensor and output tensor size.
image
image
So, Please .......
The last, your code very cool which help me so many for learn deep learning and vit , very very thank you and your code