Training new SuperTransformer - calculating number of SubTransformer combinations?

Question

Training new SuperTransformer - calculating number of SubTransformer combinations?

ihish52 opened this issue 3 years ago · comments

Dear Authors,

Thanks for the great library. I am currently attempting to train a new SuperTransformer. The paper states that the default design space contains 10^15 SubTransformer configurations. Can you explain how this number is calculated, so I can work on calculating the number of SubTransformers in my new SuperTransformer?

Hanrui “Ryan” Wang · Answer 1 · Sun Jun 13 2021 12:03:14 GMT+0800 (China Standard Time)

Hi ihish52,

Our computation method is: 2[encoder embedding dim]2[decoder embedding dim](3[encoder layer hidden dim]2[encoder self attn head number])^6[encoder layer num]((3[decoder layer hidden dim]2[decoder layer self attn head number]2[decoder layer en-de attn head number]3[arbitrary en-de attn])^6[decoder layer num] + (3223)^5 +(3223)^4+(3223)^3+(3223)^2+(3223)^1) = 0.42E15

ihish52 · Answer 2 · Tue Jun 15 2021 22:41:14 GMT+0800 (China Standard Time)

Hi Hanrui-Wang,

Thanks for the steps to the calculation. Appreciate it!