yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fail to obtain the reported MACs for performer-based models.

blackfeather-wang opened this issue · comments

Hi,

Thank you for this repo! It is really helpful. However, we fail to obtain the reported MACs for performer-based models (T2T-ViT-7/10/12). Importantly, we found a strange phenomenon.

Both the paper and the code indicate that T2T-ViT-7/10/12 have the same architectures for the T2T module and transformer layers, and differ from each other only on the number of transformer layers. From the reported MACs (shown below), one can observe that the MACs for a single transformer layer is (1.8 - 1.2) / 3 = (2.2 - 1.8) / 2 = 0.2G. As a consequence, for the T2T-ViT-7, we have 1.2 - 0.2*7 < 0, which indicates that the T2T module has negative MACs! Would you please to tell us if we miscalculate something?

image

Hi, thanks for notice.

For the three lite variants of T2T-ViT, each layer of Transformer layer is 0.125G MACs.
image

The MACs of T2T-ViT-7 is 0.125*7+0.7, here 0.7 is for T2T module. So The MACs of T2T-ViT-7, T2T-ViT-10, T2T-ViT-12 are 1.57G, 1.9G and 2.2G, we will update the results in repo.

Thank you for your reply.

How are the MACs of the T2T modules obtained? In fact, we have tried to calculate the MACs following the code, but we got much smaller MACs (i.e., ~0.25G).

Hi.
I got a similar 0.25G MACs for the T2T modules. I tried to calculate the MACs following the code and the repo as you suggested in another issue for exp, sum, and divide operations' MACs calculation. Can you give some hints? Thanks in advance.

Hi,

We double checked the MACs of T2T module, and it should be ~0.25G, and we have updated the repo and will update the paper soon.

Hi @yuanli2333, thanks for the great work. May I ask you to share the actual script that you used to calculate T2T and the original ViT as well? It would be very helpful if you do so .. I have found different papers report different MAC numbers for the original ViT as well.