kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CausalTransformerV2 or CausalTransformer?

leejason opened this issue · comments

Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?

Thanks for any advice.