mit-han-lab / hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Home Page:https://hat.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training new SuperTransformer - calculating number of SubTransformer combinations?

ihish52 opened this issue · comments

Dear Authors,

Thanks for the great library. I am currently attempting to train a new SuperTransformer. The paper states that the default design space contains 10^15 SubTransformer configurations. Can you explain how this number is calculated, so I can work on calculating the number of SubTransformers in my new SuperTransformer?

Hi ihish52,

Our computation method is: 2[encoder embedding dim]2[decoder embedding dim](3[encoder layer hidden dim]2[encoder self attn head number])^6[encoder layer num]((3[decoder layer hidden dim]2[decoder layer self attn head number]2[decoder layer en-de attn head number]3[arbitrary en-de attn])^6[decoder layer num] + (3223)^5 +(3223)^4+(3223)^3+(3223)^2+(3223)^1) = 0.42E15

Hi Hanrui-Wang,

Thanks for the steps to the calculation. Appreciate it!