transfomer model with different paramters

Question

transfomer model with different paramters

ChuanyangZheng opened this issue 3 years ago · comments

Hello, I am confused in your results on WMT’14 En-De and WMT’14 En-Fr:
I wonder how you get transformer proposed by Vaswaniet al. (2017) for WMT with different paramters such as 2.8M, 5.7M, by pruning I guess?

Zhanghao Wu · Answer 1 · Mon Jan 11 2021 20:59:03 GMT+0800 (China Standard Time)

Thank you for asking! As we mentioned in the paper, we omit the word embedding lookup table from the model parameters. : )

Chuanyang Zheng · Answer 2 · Tue Jan 12 2021 10:01:02 GMT+0800 (China Standard Time)

Thank you very much for you kind reply. However, you might get my point. I wonder how you compress the original Transformer into different model size in Table 1. For example, the smallest 2.8M transfomer is much samller than original Transfomer size 45M(not counting word embedding).

Zhanghao Wu · Answer 3 · Wed Jan 13 2021 22:41:53 GMT+0800 (China Standard Time)

Thank you for asking! As we mentioned in the paper, we shrink the embedding size of the model to reduce the number of parameters, following the settings in the evolved transformer.