Model parallel transformers in JAX and Haiku
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
leejason opened this issue 2 years ago · comments
Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?
Thanks for any advice.