mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

Home Page:https://arxiv.org/abs/2004.11886

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

transfomer model with different paramters

ChuanyangZheng opened this issue · comments

Hello, I am confused in your results on WMT’14 En-De and WMT’14 En-Fr:
I wonder how you get transformer proposed by Vaswaniet al. (2017) for WMT with different paramters such as 2.8M, 5.7M, by pruning I guess?

Thank you for asking! As we mentioned in the paper, we omit the word embedding lookup table from the model parameters. : )

Thank you very much for you kind reply. However, you might get my point. I wonder how you compress the original Transformer into different model size in Table 1. For example, the smallest 2.8M transfomer is much samller than original Transfomer size 45M(not counting word embedding).

Thank you for asking! As we mentioned in the paper, we shrink the embedding size of the model to reduce the number of parameters, following the settings in the evolved transformer.