Has a big time gap between GPU and CPU for Rearrange op.
Bobo-y opened this issue · comments
Bobo~ commented
for vit.py, the code of to_patch_embedding:
self.to_patch_embedding = nn.Sequential( Rearrange('b c (h p1) (w p2) -> b (h w) (p1 p2 c)', p1 = patch_height, p2 = patch_width), nn.Linear(patch_dim, dim), )
I test their inference time on cpu and gpu,
'def forward(self, img):
begin = time.time()
x = self.to_patch_embedding(img)
print('time of to_patch_embedding {}'.format(time.time() - begin))
b, n, _ = x.shape
cls_tokens = repeat(self.cls_token, '() n d -> b n d', b = b)
x = torch.cat((cls_tokens, x), dim=1)
begin_pos = time.time()
x += self.pos_embedding[:, :(n + 1)]
print('time of pos_embedding {}'.format(time.time() - begin_pos))
x = self.dropout(x)
begin_trans = time.time()
x = self.transformer(x)
print('time of transformer {}'.format(time.time() - begin_trans))'
When using GPU, to_ path_ Embedding is much slower. How can we improve it?