Pytorch classification vit_model.py Class PatchEmbed forward function features size could have some problems?

Question

Pytorch classification vit_model.py Class PatchEmbed forward function features size could have some problems?

XiaoluJiayou opened this issue 5 months ago · comments

hello, your code where vit_model.py class PatchEmbed forward function x.shape i think could have problems.
In PacthEmbed class forward function , your example x.shape should be [B, C, HW] after self.proj(x).flatten(2) .
__**i think after proj(x) should be [B, Embed_dim=768, 1616(patch_size)] not be [B, C=3, HW (224 224) ]**__
please pay some attention to this issue , Thank you very much!

Ross Wightman · Answer 1 · Sun Apr 07 2024 23:22:39 GMT+0800 (China Standard Time)

@XiaoluJiayou it's correct, proj is a conv, input = output = BCHW ... flatten BCN, transpose BNC

XiaoluJiayou · Answer 2 · Tue Apr 09 2024 14:50:23 GMT+0800 (China Standard Time)

@XiaoluJiayou it's correct, proj is a conv, input = output = BCHW ... flatten BCN, transpose BNC

Hello, sir this cloud have some problems. I write a code to test the input tensor and output tensor size.

So, Please .......
The last, your code very cool which help me so many for learn deep learning and vit , very very thank you and your code