yangsenius / TransPose

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.

Home Page:https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why is a small number of parameters require a very large memory?

FlyuZ opened this issue · comments

I found that this model is much smaller than the parameter or the amount of operation is much smaller than HRNET, but the memory occupied by the training is particularly large. Why is this?Is this the characteristics of VIT?
Thank you for your answer.

Hi, @FlyuZ. The number of parameters of this model is smaller than HRNet, but the calculation amount and occupied memory are usually larger than it. You are right. This can be attributed to the characteristics of Transformer. Self-attention computes pairwise inner product between pairwise input contents with only needing few model weight parameters, while CNN mainly computes matrix multiplications between input contents and convolution kernel weights.