kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finetuning GPT Neo 20B Using TPU V3-8s

nikhilanayak opened this issue · comments

Would it be possible to finetune 20B using this repo and the TPU V3-8's? If so, how many TPUs would be needed and how would I have to change the code to make it work with more than 1 TPU?